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ABSTRACT 


As  digitization  of  data  becomes  more  prevalent,  the  demands  on  existing 
commumcations  networks  and  computer  systems  to  cope  with  this  increase  become 
overwhelming.  Currently,  the  speech  compression  problem  is  handled  using  the  CELP 
(Code  Excited  Linear  Prediction)  scheme  and  its  derivatives.  Such  techniques  are  the 
most  frequently  used  for  speech  compression  at  medium-to-low  rate  ranges.  Recent 
research  conducted  into  the  area  of  cosine  packets  has  proven  this  field  to  be  readily 
adaptable  to  speech  compression  and  coding.  In  this  thesis,  speech  compression  schemes 
are  developed  using  cosine-packet  decomposition,  minimum  entropy  basis  selection,  and 
an  adaptive  thresholding  scheme  for  selecting  coefficients.  In  addition,  voiced-unvoiced 
segmentation  and  a  denoising  scheme  are  implemented.  Test  results  show  high 
compression  ratios  (1 :50)  with  a  good  quality  of  reconstructed  speech. 
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I.  INTRODUCTION 


Speech  compression  allows  smaller  bandwidth,  higher  data  rates  or  a  combination 
of  these  atributes.  It  can  also  be  used  to  store  speech  like  data  in  a  compact  form. 

This  thesis  develops  speech  compression  schemes  based  on  the  Local 
Trigonometric  Transform  [2],  which  use  an  adaptive  thresholding  scheme  proposed  in 
this  work.  These  schemes  perform  a  time  partition  of  the  original  speech  data  first, 
according  to  a  maximum  depth  selected  by  the  user.  An  experimentally  derived,  optimum 
depth  is  proposed,  based  on  the  results  of  tests  with  several  words  and  phonemes  (defined 
in  Chapter  II).  Following  the  time  partitioning,  a  basis  obtained  via  the  minimum  entropy 
best  basis  algorithm  is  selected.  In  order  to  perform  compression,  coefficients  are  selected 
according  to  an  adaptive  thresholding  scheme,  which  varies  the  compression  percentage 
depending  on  the  energy  and  frequency  content  of  each  interval.  The  intervals  are 
classified  by  a  voiced-xmvoiced  segmentation  algorithm.  Depending  on  their 
classification,  selection  of  coefficients  is  made  in  such  a  way  that  more  coefficients  are 
preserved  for  the  voiced  than  for  die  unvoiced  intervals.  Then,  these  coefficients  are 
encoded  using  umform  quantizers  and  Huffman  coding  to  achieve  average  compression 
ratios  of  1:50.  In  addition,  two  denoising  schemes  are  proposed  to  minimize  effects  of 
equipment  noise  below  120  Hz,  thereby  improving  the  sound  quality. 

In  a  typical  scenario,  users  of  the  proposed  schemes  will  be  able  to  adjust  speech 
quality  and  transmission  bandwidth,  based  on  the  current  channel  bandwidth  available. 
They  will  be  provided  with  the  parameters  that  maximize  the  compression  ratio,  and 
minimize  the  required  bandwith  at  an  acceptable  speech  quality.  Using  lower  bit  rate 
coding  reduces  the  transmission  bandwith  of  the  signal  and  may  prove  to  be  quite  useful 
in  partial  band  jamming  environments  where  the  available  channel  bandwidth  may  be 
limited.  It  is  understood  that  the  schemes  proposed  may  be  useful  for  military 
applications  where  the  understanding  of  the  message  is  more  important  than  the  overall 
quality  of  the  sound.  This  thesis  concentrates  on  finding  the  best  possible  compression 
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ratio,  while  still  keeping  an  acceptable  sound  quality.  In  this  work,  sound  quality  is 
defined  in  terms  of  a  mean  square  error  as  well  as  in  terms  of  a  proposed  quality  measure. 
Extentions  of  the  proposed  techniques  lead  to  data  storage  improvements  and  they  can 
also  easily  be  adopted  to  cryptographic  applications.  The  thesis  is  organized  in  the 
following  manner.  Chapter  11  presents  an  introduction  to  speech  processing,  where  the 
concepts  of  phonemes  and  coarticulation  effects  are  introduced  and  illustrated.  Chapter 
III  introduces  the  Local  Trigonometric  Transform  and  presents  the  Local  Cosine 
Transform  adopted  in  our  work.  The  Local  Cosine  Transform  can  be  viewed  as  a  basic 
building  block  for  the  more  complex  Cosine  Packet  Transform,  which  has  been  used 
recently  in  speech  applications  [2].  The  Cosine  Packet  Transform  can  also  be  viewed  as  a 
dual  operation  of  the  Wavelet  Packet  Transform  [2].  Both  packet  schemes  are  presented, 
discussed  and  compared  in  Chapter  IV.  The  Wavelet  and  Cosine  Packet  Transforms 
involve  the  selection  of  a  particular  basis  “best”  matched  to  the  signal  under  study  for 
compression  applications.  This  choice  of  basis  is  carried  out  via  the  Best  Basis  algorithm, 
which  is  presented  in  Chapter  V.  Chapter  VI  presents  the  denoising  and  compression 
schemes  investigated  in  this  work.  Denoising  allows  for  enhancement  of  the  audio  quality 
of  the  speech  signals  when  noise  is  present.  Chapter  VII  describes  the  encoding  schemes 
used  to  compress  the  speech  information.  Chapter  VIII  first  discusses  the  experiments 
and  parameters  used  to  test  our  denoising  and  compression  schemes.  Next,  it  presents  the 
results  obtained  using  various  phonemes,  words  and  sentences.  The  data  base  consists  of 
a  limited  collection  of  American-English  words,  some  Portuguese  words  and  some 
typical  voiced  and  unvoiced  soimd  segments.  Some  of  the  more  elaborate  data  sets 
consist  of  complete  sentences  and  dialogues.  Finally,  we  compare  compression  results 
obtained  with  our  Cosine  Packet  scheme  and  those  obtained  with  the  Wavelet  Packet 
scheme  using  a  “Daubechies”  basis  function  [17].  Results  show  that  the  Cosine  Packet 
Transform  outperforms  the  Wavelet  Packet  Transform  on  the  speech  segments  considered 
in  this  study.  Finally,  Chapter  IX  contains  the  conclusions  and  final  considerations.  All 
computer  algorithms  are  listed  in  the  Appendix. 
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n.  INTRODUCTION  TO  SPEECH  PROCESSING 


One  of  the  principal  differentiating  features  of  any  speech  sound  is  excitation  [1]. 
Two  elemental  excitation  types  are  present  in  speech  data:  (1)  voiced  and  (2)  unvoiced. 
Voiced  sounds  have  high  energy  and  low  frequency,  while  unvoiced  sounds  have  low 
energy  and  high  frequency.  Another  important  characteristic  of  speech  signals  is  that  they 
are  locally  stationary. 

The  basic  theoretical  unit  for  describing  how  speech  conveys  linguistic  meaning  is 
called  z.  phoneme.  Each  language  has  its  own  set  of  phonemes.  For  example,  American 
English  has  about  42  phonemes,  while  Brazilian  Portuguese  has  about  51  phonemes  (Rio 
de  Janeiro  region).  They  are  made  of  vowels,  semivowels,  diphthongs,  and  consonants.  In 
general,  the  duration  of  each  phoneme  may  vary  from  15  to  400  milliseconds,  depending 
on  the  sound  produced  and  the  way  it  is  pronounced.  For  example,  vowels  can  vary 
largely  in  duration,  typically  from  40  to  400  milliseconds. 

The  transition  from  one  phoneme  to  another  is  not  made  abruptly  or 
independently  of  adjacent  phonemes.  Actually,  adjacent  phonemes  have  a  strong 
influence  on  the  manner  in  which  the  transition  takes  place.  The  term  used  to  refer  to  the 
change  in  phoneme  articulation  and  acoustics  that  is  caused  by  the  influence  of  another 
phoneme  is  coarticulation. 

Since  this  research  investigates  speech  compression,  there  are  two  main 
requirements.  First,  we  need  to  be  able  to  split  a  speech  signal  into  its  smallest  locally 
stationary  “cells”  constituted  by  phonemes,  and  represent  them  in  a  minima]  way  with 
good  fidelity.  Second,  we  need  to  preserve  coarticulation  effects  as  much  as  possible  (i.e., 
we  need  to  preserve  the  smooth  transition  from  one  phoneme  to  the  next) . 

Figpre  2.1  illustrates  the  coarticulation  process  for  the  sound  /issos/.  The  top  plot 
represents  time-domain  speech.  The  middle  plot  represents  the  voiced  and  unvoiced 
portions  of  this  sound  obtained  using  the  zero-crossing  rate  and  the  short-time  energy 
contained  in  the  sound  [1].  The  unvoiced  portions  are  -ss-  and  -s-,  corresponding  to  the 
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phoneme  /s/.  The  high  frequency  and  low  energy  of  unvoiced  segments  are  illustrated  by 
the  low  short-time  energy  and  high  zero-crossing  rates.  The  voiced  portions  of  the  sound 
are  the  phonemes  /i/  and  /o/.  The  low  frequency  and  high  energy  of  voiced  phonemes  are 
illustrated  by  the  high  short-time  energy  and  low  zero-crossing  rate.  The  bottom  plot 
shows  the  spectrogram  obtained  using  a  Hanning  time  window  of  length  256  samples 
with  an  overlap  of  128.  Note  the  coarticulation  effects  present,  which  allow  for  smooth 
transitions  between  phonemes.  For  example,  the  transition  from  /i/  to  /s/  occurs  through  a 
“link,”  which  takes  place  in  a  high  frequency  portion  of  the  spectrum,  and  which  is  an 
example  of  anticipatory  coarticulation  (or  right-to-left  coarticulation).  This  means  that  the 
articulator  has  moved  from  the  present  phoneme  (/i/)  toward  a  position  (higher  frequency) 
that  is  more  appropriate  for  the  following  phoneme  (/s/). 
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Normalized  Frequency  Amplitude 


Figure  2.1  Sound  “ISSOS,”  male  non-native  speaker;  top  plot:  Time  domain 
representation;  middle  plot:  Short  time  energy  and  zero-crossing  representation; 
bottom  plot:  Spectrogram  of  “ISSOS”  using  a  Hanning  time  window  of  length  256 
samples  with  overlap  128,  fs  =  8  KHz. 
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III.  THE  LOCAL  TRIGONOMETRIC  TRANSFORM 


TMs  chapter  discusses  the  main  concepts  related  to  the  Local  Trigonometric 
Transform  theory  and  its  implementation.  Much  of  the  mathematical  rigor  is  omitted,  and 
emphasis  is  placed  on  the  basic  theory  and  its  application  to  speech  processing.  This 
chapter  is  divided  into  six  sections.  The  first  provides  an  introduction,  and  the  second 
presents  some  basic  definitions  about  the  rising/cutoff  function.  The  third  section  defines 
the  folding  and  unfolding  operations  that  are  used  for  the  transform  [2].  The  fourth 
describes  the  Continuous  Transform  and  its  main  mathematical  properties.  The  fifth 
defines  the  Discrete  Transform.  Finally,  the  last  section  applies  these  concepts  and 
describes  how  the  transform  may  be  performed  by  using  orthonormal  bases  to  allow  for 
signal  analysis  and  synthesis. 

A.  INTRODUCTION 

In  order  to  analyze  small  portions  of  the  speech  signal,  it  must  be  partitioned  in 
time.  The  local  transform  defined  in  this  chapter  applies  a  “local  cosine,”  which  is  a  basis 
function  that  allows  the  signal  to  be  cut  into  time  slices.  As  first  defined  by  Malvar  in 
1987  [3],  the  “local  cosines”  provided  a  regularly  spaced  partition  in  time.  Later,  Coifrnan 
and  Meyer  [4]  and  Meyer  [5]  tackled  the  problem  of  modifying  regular  constructions  to 
obtain  windows  with  variable  lengths  that  could  be  defined  arbitrarily.  They  began  by 
partitioning  time  into  adjacent  intervals  aj+j],  as  illustrated  in  Figure  3.1.  Figure  3.2 
shows  in  more  detail  how  the  windows  may  be  combined  while  still  preserving  the 
smoothness  and  integrity  of  the  signal.  The  windows  used  are  essentially  the  intervals  [oj 
The  disjoint  intervals  [oj  -  Sj ,  Oj  +  Sj  ]  allow  the  windows  to  overlap.  In  summary, 
the  local  cosines  (called  “Malvar  wavelets”)  are  constructed  with  a  rising  duration  (28j),  a 
stationary  period  (At),  and  a  decay  (which  lasts  2e^+{).  The  ability  to  arbitrarily  and 
independently  choose  the  duration  of  the  rising  and  decaying,  as  well  as  the  stationary 
section,  is  exactly  what  makes  the  Malvar  wavelets  different  from  other  well-known 
wavelets  (e.g.,  Gabor  or  Daubechies)  [5].  Of  course,  it  is  important  to  use  this  ability 
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efficiently.  This  choice  will  be  discussed  in  the  following  chapters,  where  we  focus  on  the 
best  basis  for  decomposition  of  the  signal. 


Oi2  0^3  0:4 


Figure  3 . 1  Arbitrary  Partition  of  Time  into  Adjacent  Intervals 


Figure  3.2  Overlapping  windows  of  arbitrary  size 


B.  THE  RISING /CUTOFF  FUNCTION 

The  well-known  Discrete  Cosine  Transform  (DCT)  has,  as  its  basis  function,  a 
“block  cosine”  (i.e.,  a  rectangular  window  that  multiplies  the  cosine  function).  The 
functions  obtained  by  the  block  cosine  result  in  a  discontinuity  or  an  abrupt  variation  in 
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the  signal.  As  a  result,  we  have  discontinuities  at  the  block  boundaries  of  the  reconstructed 
signal.  The  effects  produced  include  the  so-called  “blocking  effect”  in  image  coding,  and 
the  “clicking  sounds”  in  speech  coding  [6].  These  problems  can  be  avoided  by  defining  a 
window  based  on  a  function  that  allows  for  a  smooth  transition  from  zero  to  the  amplitude 
of  the  cosine  (on  the  left  edge),  as  well  as  from  that  amplitude  to  zero  (on  the  right  edge). 

The  function  r  is  defined  as  r  =  r(r)  in  the  class  C^(i?),  for  some  0<cf<oo, 
satisfying  the  following  conditions; 


I  ^(0  P  +  I  r(-0  p  =  1  for  all  t  £  R;  r{t)  = 


'  0,  at  <  -1, 
.  1,  if  i  >  1. 


(3.1) 


It  is  called  a  rising  cutoff  function  because  r(t)  monotonically  increases  from  zero  to  one 
over  the  domain  of  t  from  -  oo  to  +  oo.  That  function  is  presented  in  Figure  3.3. 


Figure  3.3  The  rising  cutoff  function 
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c. 


FOLDmG  AND  UNFOLDING 


The  folding  operator  U and  its  adjoint  unfolding  operator t/*  are  defined  as  follows: 


Uf{t)  = 


r{t)f{t)  +  r{-t)f{-t),  iit>0 
ri-t)f(t)  -  r{t)f{-t),  if  t  <  0 


(3.2) 


K0/(0  -  if  t>0 

K-0/(<)  +  if  t<0. 


(3.3) 


Observe  that  Uf  (t)  =  f(t),  and  =f  (/) ,  if  r>  /  or  if  r  <  -I.  Also,  17*U  f(t)  = 

UU*fit)  =  ( I  r(0  P  +  I  r(-t)  I  ^  )  fit)  =/ it) ,  for  all  t  ^  0,  so  that  U  and  U*  are 
isomorphisms  of  iR).  This  means  that  one  operator  is  the  inverse  of  the  other. 

Figure  3.4  illustrates  the  unfolding  operation  on  a  block  cosine.  Figure  3.4a  shows 
a  block  cosine.  Figures  3.4b  and  3.4c  illustrate  the  cosine  unfolded  at  its  left  edge,  and 
unfolded  at  both  edges,  respectively. 

Figure  3.5  presents  a  block  cosine  and  a  block  sine  after  periodic  folding.  The 
purpose  of  folding  is  to  prepare  the  function  intervals,  so  that  the  adjacent  windows  can  be 
overlapped  further  -without  changing  the  function  in  the  overlapping  interval. 
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(b)  Left  edge  unfolded  (c)  Both  edges  unfolded 

Figure  3.4  Unfolding  operator  in  a  block  cosine 
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Figure  3.5  (a)  Block  cosine  and  (b)  Block  sine,  both  after  periodic  folding 


To  extend  the  concept  of  folding  and  unfolding  in  an  interval,  the  operators  now 
can  be  shifted  and  dilated  so  that  their  action  takes  place  on  an  arbitrary  interval  (a  —  s,  a 
+  s).  Now,  after  partitioning  the  time  by  periodically  folding  the  left  and  right  edges  of 
each  interval,  all  the  adjacent  component  windows  can  be  unfolded  and  overlapped.  The 
window  formed  by  the  rising  cutoff  function  is  called  a  bell.  Figure  3.6  displays  two  small 
bells  (called  child  bells)  overlapped  and  one  inverted  large  bell  (called  the  parent  bell) 
below,  showing  that  it  is  possible  to  preserve  both  the  smoothness  between  intervals  and 
the  signal  integrity  (with  no  loss  of  information),  if  each  interval  is  unfolded  and  then 
overlapped.  This  explains  how  parent  windows  may  be  split  into  two  child  windows  (in 
the  decomposition  phase),  and  how  two  child  windows  may  be  combined  to  form  one 
parent  window  (in  the  reconstruction  phase).  This  property  is  particularly  important  when 
the  concept  of  the  “cosine  packets”  is  introduced. 
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Figure  3.6  Two  child  bells  overlapped  and  one  inverted  parent  bell 

D.  THE  CONTINUOUS  LOCAL  TRIGONOMETRIC  TRANSFORM 
1.  Properties 

The  time  window  used  in  the  Local  Trigonometric  Transform  can  have  both 
smoothness  and  a  controlled  length  so  that  properties  such  as  time  and  frequency 
resolution  also  may  be  controlled.  This  can  be  implemented  simply  by  changing  the 
equation  of  the  window.  By  combining  windows  of  arbitrary  size  (represented  by  local 
cosines,  i.e.,  block  cosines  unfolded  at  both  edges),  it  is  possible  to  obtain  a  smooth 
orthogonal  basis.  Observe  that  each  window  is  well  localized  in  time,  as  well  as  in 
frequency.  Its  temporal  support  region  is  the  width  of  that  interval  given  by  [oj  -  Sj  ,  Oj+i 
+  £j+i]  and,  thus,  it  has  position  uncertainty  at  most  equal  to  that  width  (Figure  3.2).  Figure 
3.7  presents  three  different  bells,  which  are  called  functions  rpj,  and  r^j.  Figure  3.8 
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presents  the  positive  half  of  the  real  part  of  the  Fourier  Transform  of  the  functions  given  in 


Figure  3.7. 


Note  that  the  sidelobes  increase  as  the  roll-off  of  the  time  window 


2.  The  Local  Transform 

The  Continuous  Local  Trigonometric  Transform  is  based  on  a  set  of  orthonormal 
basis  functions  that  allow  for  a  variable-length  time  window  while  still  maintaining  a 
small  time-frequency  bandwidth  product.  The  Transform  can  be  either  a  local  cosine  or  a 
local  sine.  Since  the  local  cosine  has  been  chosen,  the  definition  of  the  so-called  “block 
cosine”  at  half-integer  frequency  is  given  as  follows: 


C„  (t)  =  cos  [  rc  («  +  1/2)  t  ] , 


(3.4) 


where  ri  is  an  integer,  and  t  is  restricted  to  the  interval  [0,1]. 

As  can  be  observed  from  the  right  side  of  Figure  3.4c,  unfolding  the  block  cosine 
at  the  edges  gives  it  the  necessary  smooth  characteristics  that  contribute  to  a  good 
frequency  resolution  for  that  transform.  Basically,  smoothness  is  obtained  by  a  smooth 
cutoff  by  sine  iteration  [2],  defined  by: 


'  0,  if  t  <  1 , 

Tsinit)  —  <  sin  [f(l  -f  f)]  ,if  -  1  <  t  <  1  , 

.  1,  if  t  >  1  ,  /  ^  V 

r[ol  =  Tsin  yt)  and  r[i+i)  =  r(,]  I  sm  -  f  , 


and 


(3.5) 

(3.6) 


Since  Tj-i]  is  smooth  on  (-1,1)  with  one  vanishing  first  derivative  at  the  boundary 
points,  the  envelope  (referred  to  as  the  bell  in  [5])  has  a  continuous  derivative  on  R .  Based 
on  the  recursion  in  Equation  (3.6),  rjq  can  be  used  with  i  >  1  to  obtain  additional 
derivatives  [5].  Actually,  it  can  be  shown  that  r[^](0  has  2'“‘  vanishing  derivatives.  r[i]  is 
used,  since  it  allows  good  resolution  and  has  very  small  side  lobes. 

Thus,  the  local  cosine  is  defined  as  : 


COS 


7r(.n  -1-  f  )(t  -  ocj) 


(3.7) 
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where  aj  and  aj+j  are  the  interval  edges,  Sj  and  8j+i  are  the  action  radii  of  the  operators  for 
both  edges,  and  rj ,  rj+i  is  the  rising  function  rjij,  applied  at  both  edges  of  the  interval.  Note 
that  the  local  cosine  as  defined  by  Equation  (3.7)  is  the  result  of  the  unfolding  operation  at 
both  edges  of  the  “block  cosine,”  i.e., 

=  U*(rjiaji  Gj)  U*(rj+iaj+iej+i).  lij(0C„j(0, 


where: 

Cn  j(0  represents  the  block  cosine  function  for  an  interval  beginning  at 

edgej; 

•U*(.)  is  the  unfolding  operator  applied  at  the  left  (j)  and  right  edge  (j+1)  of 

the  interval. 

Thus,  the  Continuous  Local  Trigonometric  Transform  is  the  inner  product  (f,\|/„  j), 
where  q;„  j  is  the  local  cosine  defined  above. 

Instead  of  computing  in  that  manner,  one  may  fold  the  function  first,  and  then 
obtain  the  inner  product  with  the  regular  “block  cosine,”  as  in  the  expression  below: 

av|/nj>  =  <t/jt/M/lijC„,j>.  (3.8) 

In  practice,  this  simple  observation  has  great  importance,  since  it  means  that  f  can 
be  preprocessed  by  folding,  and  the  local  cosine  transform  can  be  computed  with  an 
ordinary  cosine  transform  [2]. 

It  is  also  important  to  observe  that,  by  defining  the  transformation  as  an  innp.r 
product,  what  is  measured  is  the  amoimt  of  “similarity”  between  the  signal /(t)  and  the 
basis  function  C„  j.  This  is  one  of  the  key  attributes  that  make  the  local  cosine  transform 
convenient  for  the  transformation  of  speech  signals  and,  therefore,  good  for  compression 
and  coding.  The  fact  that  speech  can  be  considered  a  locally  stationary  signal  with  a 


17 


reasonable  correlation  to  sines  and  cosines  may  explain  some  of  tiie  good  results  when 
using  a  Local  Trigonometric  Transform. 

E.  THE  DISCRETE  COSINE  TRANSFORM 

By  replacing  just  the  variables  with  integers,  and  by  using  the  discrete  cosine 
transform,  it  is  possible  to  obtain  discrete  versions  of  the  local  cosine.  So  Equation  (3.9)  is 
exactly  the  same  formula  as  Equation  (3.7),  but  with  the  variables  replaced  by  integer 
values.  In  Equation  (3.9)  it  is  assumed  that: 

•  aj<  Oj+i,  where  Oj  andoj+i  are  integers; 

•  the  signal  is  sampled  at  integer  points  t,  aj  <  t  <  Oj+i,  which  gives  (aj+j  -  Oj) 
samples; 

•  Tj  and  Tj+j  are  the  rising  functions  r[i] ,  applied  at  both  edges  of  the  interval; 

•  Ej  >  0  and  Ej+i  >  0,  with  Ej  +  Ej+i  <  number  of  samples  to  insure  that  the  action 
regions  are  disjoint. 

Equation  (3.9)  also  makes  a  distinction  between  the  left  and  right  endpoints, 
because  sampling  is  done  at  the  left  endpoint  of  each  interval.  If  sampling  is  done  in  the 
middle  of  the  intervals  (which  can  be  done  by  taking  the  function  in  Equation  (3.7)  and 
replacing  every  instance  of  t  with  t+1/2),  it  will  be  more  symmetric,  and  the  basis 
functions  will  be  cosines  sampled  between  grid  points.  The  result  is  the  following  discrete 
local  cosine  basis  function: 


nj  (DCT-rV) 


(0 


^>+1 


/  2 

7r(t  +  |)(t  +  1  -  aj) 

/  cos 

1  Q-j+i  -  aj 

(3.9) 
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F.  APPLICATION  TO  SIGNAL  ANALYSIS/SYNTHESIS 

Given  an  arbitrary  partitioning  of  a  signal  in  time,  it  is  possible  to  construct  several 
smooth  orthogonal  bases,  using  the  local  cosine  transform  as  the  basis  function.  The 
scheme  that  leads  to  the  best  partition  and  the  best  basis  for  this  application  will  be 
introduced  in  the  next  chapter.  This  section  explains  how  the  DCT-IV  can  be  used  for  an 
analysis  in  the  frequency  domain  and  for  ftirther  synthesis  in  the  time  domain. 

As  mentioned  in  sections  “C”  and  “D”,  the  signal  is  first  folded  at  the  left  and 
right  ends  of  each  interval.  Then,  an  ordinary  DCT-IV  transform  is  used  to  compute  the 
Local  Cosine  Transform  for  each  of  the  windows  obtained.  Now,  it  becomes  possible  to 
analyze  each  time  window  using  the  frequency  spectrum  (from  DC  to  fJ2,  where  is  the 
sampling  frequency).  To  reconstruct  the  signal,  the  DCT-IV  is  applied  to  obtain  the 
inverse.  As  in  the  decomposition  pheise,  the  transform  is  computed  first  with  the  regular 
“block  cosine,”  and  then  the  intervals  are  unfolded,  instead  of  using  the  local  cosine.  By 
periodically  unfolding  the  left  edges  of  the  current  interval  and  the  right  edge  of  the 
following  one,  the  smoothness  and  integrity  of  the  fimction  are  preserved,  allowing  the 
time  domain  fimction  to  be  reconstructed. 
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IV.  WAVELET  AND  COSINE  PACKET  TRANSFORMS 


This  chapter  presents  the  Wavelet  Transform  and  two  general  time-frequency 
analysis  schemes:  the  Wavelet  Packet  Transform  and  the  Cosine  Packet  Transform . 

A.  INTRODUCTION 

The  goal  of  this  thesis  is  to  obtain  the  scheme  best  suited  for  the  decomposition 
and  reconstruction  of  speech  signals,  in  particular,  one  that  can  decompose  a  speech 
signal  into  an  orthonormal  basis  function.  First,  the  Wavelet  Transform  (WT)  and  its 
main  properties  and  characteristics  are  discussed.  Next,  the  general  concept  of  the 
Wavelet  Packet  Transform  (WPT)  is  introduced.  Finally,  the  Cosine  Packet  Transform 
(CPT)  is  presented.  This  last  scheme  initially  performs  a  time  split,  as  opposed  to 
transforms  that  first  split  the  signal  in  the  frequency  domain. 

B.  THE  WAVELET  TRANSFORM 

In  the  Wavelet  Transform  (WT)  algorithm,  the  sampled  data  set  is  passed  through 
the  low-pass  and  high-pass  filters  with  complementary  bandwidths,  known  as  quadrature 
mirror  filter  (QMF)  pairs  [7].  The  outputs  of  both  filters  are  decimated  by  a  factor  of  two. 
So,  at  each  scale,  we  have  a  set  of  high-pass  filtered  data  and  a  set  of  low-pass  filtered 
data.  Each  of  these  sets  has  half  as  many  elements  as  the  original  data  set,  as  a 
consequence  of  the  decimation.  The  low-pass  filtered  data  can  be  used  as  the  data  input 
for  another  pair  of  filters  identical  to  the  first  pair,  generating  another  set  of  low-  and 
high-pass  coefficients  at  the  next  lower  level  of  scale  [8]. 

This  process  can  continue  until  the  set  of  original  coefficients  has  been  reduced  to 
the  imnimal  scale  level,  which  is  two  coefficients.  Figure  4.1  presents  the  pyramid 
algorithm  of  the  WT.  Figure  4.2  shows  how  a  unit  interval  of  length  2^  samples  can  be 
decomposed  to  obtain  a  maximum  of  j  levels  of  transform  data.  Figure  4.3  presents  the 
tiling  diagram  that  corresponds  to  the  WT  decomposition.  This  shows  that  the  WT  works 
well  if  the  signal  is  composed  of  strong  components  of  short  duration,  i.e.,  bursts.  This 
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means  that  the  WT  is  a  good  detector  of  transients.  It  also  works  well  if  the  signal  is 
composed  of  low-frequency  components  of  long  duration  [9]. 

As  stated  earlier,  speech  is  composed  of  portions  of  either  high  frequency  or  low 
frequency,  both  with  a  typical  minimum  duration  of  about  15  milliseconds.  These 
characteristics  indicate  that  the  WT  may  not  be  the  best  scheme  for  speech  signal 
analysis. 
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Figvue  4. 1  WT  implementation:  A  bank  of  QMF  pairs 
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Figure  4.2  Wavelet  transform:  decomposing  '2  samples  into  a  maximiim  of  j  levels 
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frequency 


Figure  4.3  WT  tiling  diagram 

C.  THE  WAVELET  PACKET  TRANSFORM 

The  WT  is  not  the  only  way  to  split  the  signal  in  the  frequency  domain.  The  Short 
Time  Fourier  Transform  (STFT),  for  example,  is  another  possible  scheme.  However,  in 

the  STFT,  both  the  time  and  frequency  resolution  are  kept  constant  by  the  choice  of  the 
time  window  length  (Figure  4.4). 

Actually,  both  the  WT  and  the  STFT  can  be  viewed  as  part  of  a  general  scheme 
called  the  Wavelet  Packet  Transform  (WPT),  which  is  a  collection  of  possible  sets  of 
orthonormal  basis  functions. 
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frequency 


Figure  4.5  depicts  the  general  tree  structure  for  the  WPT.  Note  that  the  heavy  lines 
indicate  the  graph  that  forms  the  WPT  basis.  The  symbol  L  or  H  has  been  assigned  to 
each  half  frequency  division,  depending  on  whether  it  is  a  high-  or  low-frequency  band. 
Following  the  tree  structure,  we  have  assigned  those  symbols  sequentially,  following  the 

same  rule.  Note  that  the  WT  basis  consists  of  the  subspaces  H,  LH,  LLH,  LLLH  and 
LLLL. 
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Figiire  4.5  General  tree  structure  for  the  WPT 


The  sequences  L,  LL  and  LLL  are  intermediate  steps  leading  to  the  generation  of 
the  subspaces  of  the  wavelet  basis  at  the  lower  levels. 

Since  the  frequency  splitting  results  in  the  low-  or  high-pass  version  of  the  filtered 
data  (i.e.,  either  half  branches  of  the  tree),  j2^  graphs  representing  different  orthonormal 
bases  can  be  created.  Figure  4.6  presents  three  different  Wavelet  Packet  decompositions. 
The  basis  is  a  subband  decomposition  scheme  [10],  where  the  basis  obtained  is  composed 
of  the  eight  bottom  divisions.  The  second  is  another  possible  decomposition  leading  to  an 
orthonormal  basis.  The  third  decomposition  is  exactly  the  opposite  of  that  obtained  using 
the  WT.  Figure  4.7  illustrates  the  tiling  diagram  that  corresponds  to  the  third 
decomposition.  Note  the  higher  frequency  resolution  for  higher  frequencies,  and  the 
higher  time  resolution  for  lower  frequencies. 


25 


Figure  4.6  Three  different  wavelet  packet  decompositions  leading  to  three  different  bases 


frequency 


Figure  4.7  Tiling  diagram  for  the  decomposition  of  figure  4.6c 
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D.  THE  COSINE  PACKET  TRANSFORM 


The  Cosine  Packet  Transform  (CPT)  is  a  scheme  that  allows  for  a  time-splitting 
decomposition  prior  to  the  frequency  transformation.  If  one  imagines  the  original  signal 
in  the  time  domain  being  split  successively  into  two  halves  at  each  iteration,  a  tree 
configuration  will  result  (Figure  4.8).  If  the  transform  imposes  no  restriction  on  the 
support  intervals  of  the  window  envelopes,  the  tree  does  not  need  to  be  homogeneous. 
This  means  that  the  windows  do  not  need  to  be  combined  in  the  same  way  (either  in  pairs 
or  in  any  other  specific  manner).  Also,  the  subspaces  do  not  need  to  be  of  equal  size.  So, 
in  analogy  to  the  wavelet  packets  case,  one  is  now  faced  with  a  large  number  of  possible 
orthonormal  basis  configurations,  each  one  of  them  being  considered  as  a  cosine  packet. 
It  is  important  to  observe  that  in  the  cosine  packets  case,  the  windows  do  not  need  to  be 
of  a  dyadic  size,  they  may  be  of  an  arbitrary  size.  However,  in  this  thesis,  only  dyadic 
sized  windows  are  considered. 


Levels 


We  also  note  that,  as  one  goes  down  the  tree,  time  resolution  is  improved  by  a 
factor  of  two  at  each  layer,  while  frequency  resolution  is  decreased  by  a  factor  of  two  at 
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each  iteration.  Figure  4.9  presents  the  tiling  diagram  that  corresponds  to  the  tree 
configuration  shown  in  Figure  4.8.  The  CPT  works  in  such  a  way  that,  after  time  splitting 
to  a  certain  depth,  a  basis  is  selected  by  some  criterion.  Then,  for  each  time  window,  the 
DCT-IV  tr^sform  is  applied. 


frequency 
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V.  THE  BEST  BASIS  ALGORITHM 


A.  INTRODUCTION 

When  a  choice  of  bases  exists  for  the  representation  of  a  signal,  it  is  possible  to 
determine  the  best  one  using  some  predetermined  criterion.  The  criterion  will  always 
depend  on  the  type  of  signal  and  the  user’s  objective.  In  this  case,  the  signal  is  speech 
and  the  objective  is  to  minimize  the  number  of  symbols  used  to  represent  the  information 
contained  in  a  given  interval  (i.e.,  it  is  desirable  to  minimize  the  entropy  of  that  interval). 
The  “best  basis”  criterion  allows  for  the  minimization  of  some  information  costs  options, 
including  the  entropy  minimization  method  [6,1 1]. 

We  recall  that  the  entropy  of  a  vector  m  =  {  u(lc)  }  is  defined  by : 

H{u)  =  Y.Pik)\o%{\lp{k)),  (5.1) 

* 

where /7(^)  =  |  u(k)  p  /  ||m|P  is  a  normalized  energy  of  the  k*  element  of  the  sequence,  and 
p  log  1/p  is  set  to  0,  if  p  =  0.  H(u)  is  the  entropy  of  the  probability  distribution  function 
(or  pdf)  given  by  p.  Note  that  H(u)  is  not  a  an  information  cost  functional,  i.e.,  it  is  not  a 
direct  function  of  the  sequence  {u(k)}.  But  the  functional 

/(«)=i:iM(^)iMog(i/i«(A)p) 

k 

is  a  direct  function.  If  l(u)  is  minimized,  then  H(u)  is  also  minimized  in  the  expression: 

«:u)=iMr"w+iogiiuip.  (5.2) 

B.  THE  BEST  BASIS  ALGORITHM  METHOD 

Initially,  the  algorithm  computes  the  entropy  obtained  in  all  intervals  or  “nodes” 
of  the  tree.  Figure  5.1  presents  an  example  of  the  cosine  packet  tree  with  corresponding 
computed  entropies.  The  Best  Basis  Algorithm  searches  the  tree  in  a  bottom-up  direction 
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and,  whenever  a  parent  node  has  a  lower  cost  than  that  of  its  children,  the  Best  Basis 
algorithm  flags  the  parent.  If  the  sum  of  the  children’s  costs  is  lower  than  that  of  the 
parent  node,  this  lower  cost  is  assigned  to  the  parent.  Similarly,  children  are  flagged  when 
they  have  a  lower  information  cost  than  their  parents.  This  step  avoids  the  need  to 
examine  any  node  more  than  twice:  once  as  a  child  and  once  as  a  parent.  Figure  5.2 
presents  the  new  and  the  former  (in  parenthesis)  information  costs  for  each  node  shown  in 
Figiure  5.1.  Then,  after  all  nodes  present  in  the  tree  have  been  examined,  the  Best  Basis 
Algorithm  selects  the  topmost  flagged  nodes,  which  constitute  a  basis.  Finally,  as  the 
topmost  flagged  node  is  encountered,  the  remaining  nodes  in  the  corresponding  subtree 
are  discarded.  Figure  5.3  displays  the  best  basis  nodes  for  this  example  as  shaded  blocks. 
Further  details  may  be  found  in  [4].  Figure  5.4  shows  a  Best  Basis  tiling  scheme  resulting 
from  the  decomposition  shown  in  Figure  5.3.  It  is  obvious  that  each  resulting  cell 
occupies  one  portion  of  the  time,  and  the  whole  frequency  spectrum  is  covered  by  each  of 
those  cells. 


/  \ 


/ 

X  \ 

/A 

/VXL 

Figure  5.1  Cosine  packet  tree  with  computed  entropies  for  every  interval  (node) 
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Figure  5.2  New  (and  former)  computed  entropy  for  each  node 
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VI.  COMPRESSION  AND  DENOISING  SCHEMES 


This  chapter  describes  the  compression  and  denoising  schemes  used  in  this 
research.  It  is  divided  into  five  sections.  First,  the  motivating  concepts  are  introduced.  In 
the  remaining  sections:  minimum  time  window,  voiced-imvoiced  segmentation,  adaptive 
thresholding  and  denoising  are  presented. 

A.  INTRODUCTION 

Initial  research  for  this  thesis  included  reviewing  existing  lossy  compression 
techniques,  which  are  divided  into  two  main  classes:  Lossy  Predictive  Coding  and 
Transform  Coding  [12].  The  attention  of  this  thesis  is  directed  to  Transform  Coding.  The 
Transform  Coding  technique  that  has  been  largely  discussed,  applied,  and  tested  is  the 
Wavelet  Transform.  However,  as  explained  in  Chapter  IV,  wavelets  are  more  appropriate 
for  the  analysis  of  either  transients  or  long-duration,  low-frequency  stationary  <;ignak 
than  for  speech  signals. 

As  shown  in  Chapter  III,  the  Local  Cosine  Transform  has  good  time  and 
frequency  resolution.  Also,  unlike  the  Fourier  Transform,  the  Discrete  Cosine  Transform 
rv  (DCT-IV)  decorrelates  the  signal  in  each  window,  which  facilitates  compression. 
Experiments  for  this  research  demonstrated  that  the  Best  Basis  Algorithm,  besides 
selecting  the  basis  with  minimal  entropy,  is  also  able  to  split  the  speech  signal  into  locally 
stationary  time  segments.  As  a  result,  the  combination  of  the  Cosine  Packet  scheme  with 
a  method  that  selects  the  Best  Basis  (BB)  configuration  to  minimize  the  entropy  in  each 
interval  seems  to  be  most  appropriate  for  the  applications  considered  here. 

An  important  characteristic  of  the  Cosine  Packet  Transform  (CPT)  is  that  it  allows 
time  resolution  to  be  controlled.  If  one  uses  the  WPT  with  the  Best  Basis  Algorithm  on 
speech,  the  algorithm  chooses  the  basis  based  on  the  minimi2ation  of  some  information 
cost  of  the  frequency  coefficients.  Thus,  in  the  WPT  case,  time  resolution  is  not  a 
function  of  the  physical  properties  of  speech.  Instead,  it  is  dependent  on  each  scale  which 
in  turn  is  selected  by  the  best  basis  criterion.  Also,  the  user  must  select  the  maYimum 
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frequency  splitting  depth  by  choosing  the  worst  (largest)  time  resolution,  not  the  best. 
With  the  CPT,  on  the  other  hand,  it  is  possible  to  choose  the  depth  and,  thus,  to  determine 
the  minimum  time  interval,  which  ideally  should  coincide  -with  the  minimnTn  locally 
stationary  portions  of  speech. 

Once  the  signal  is  divided  into  its  locally  stationary  intervals,  the  DCT-IV 
algorithm  is  applied  to  transform  the  signal  to  the  frequency  domain.  Then,  for  all  of  the 
time  windows,  the  signal  is  passed  through  a  thresholding  scheme  that  picks  up  different 
percentages  of  coefficients,  according  to  the  frequency  and  energy  contents  of  each 
frame.  Basically,  the  speech  is  divided  into  its  voiced  and  unvoiced  sounds,  making  it 
necessary  to  implement  a  scheme  for  voiced-unvoiced  segmentation. 

Recordings  made  for  this  research  included  noise  generated  by  the  equipment. 
This  noise  was  composed  basically  a  of  60  Hz  hum  and  harmonic  components.  Since  the 
noise  frequencies  in  each  time  window  were  detectable,  it  was  possible  to  denoise  the 
words  and  sentences  used  in  the  experiments.  The  system  is  composed  of  the  three  main 
blocks,  as  shown  in  Figure  6. 1 . 


Figure  6.1  System  block  diagram 


The  Cosine  Packet  scheme,  presented  in  Chapters  IV  and  V,  is  based  on  the  CPT  and 
Best  Basis  Algorithm.  The  encoding/compression  schemes  investigated  in  this  work  will 
be  presented  in  Chapter  VII. 

B.  MINIMUM  TIME  WINDOW  SIZE 

The  choice  of  the  minimum  time  window  depends  on  the  time  and  frequency 
resolution  desired.  We  recall  that,  in  the  CPT  scheme,  the  further  down  on  the  tree,  the 
better  the  time  resolution,  and  the  worse  the  frequency  resolution.  A  second  consideration 
is  to  represent  a  clean  signal  in  an  optimal  way,  so  that  the  DCT-IV  coefficients  (in  the 
frequency  domain)  lead  to  the  smallest  number  that  best  represent  the  energy  and 
frequency  content  of  each  interval.  Ideally,  the  signal  should  be  divided  into  the  exact 
locally  stationary  portions  of  the  speech,  each  beginning  and  ending  at  the  correct  points. 
This  is  to  obtain  good  compression  ratios,  where  each  time  interval  should  have  one  or 
two  representative  coefficients. 

The  best  minimum  window  sizes  were  32  or  16  milliseconds  for  most  of  the 
experiments,  and  8  milliseconds  for  some  of  them.  Since  samples  were  taken  at  8  KHz, 
this  means  that  the  intervals  are  256,  128,  or  64  samples,  respectively.  Using  windows 
shorter  than  16  ms  degraded  the  frequency  resolution  for  most  of  the  test  words  and 
sentences,  which  led  to  the  following  two  results: 

(1)  Loss  of  coarticulation; 

(2)  Degradation  in  denoising  performances. 

Although  the  depth  corresponding  to  die  16-ms  minimum-size  window  was  not 
always  the  one  that  gave  the  best  (least)  mean  square  error  ( i.e.,  comparing  to  32-ms  and 
8-ms  test  windows),  the  difference  obtained  in  that  parameter  was  not  large  enough  to 
justify  choosing  another  depth.  This  was  mainly  due  to  the  quality  factor  in 
reconstruction.  Consequently,  16  milliseconds  was  selected  as  a  compromise  for  the 
minimum  window  size. 
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C.  VOICED-UNVOICED  SEGMENTATION 

This  section  presents  an  experiment  based  on  the  voiced-imvoiced  segmentation 
scheme  proposed  by  Wesfreid  and  Wickerhauser  [13].  Recognition  of  certain  excitation 
types  was  attempted  to  obtain  the  best  possible  scheme  for  compression.  Therefore, 
speech  partitioning  became  one  of  the  subproducts  of  this  research.  Once  each  interval’s 
magnitude  spectra  and  energy  are  obtained,  it  is  possible  to  identify  voiced  and  unvoiced 
portions  of  the  speech. 

The  spectrum  is  divided  into  six  main  frequency  ranges.  Table  6.1  displays  the 
low  and  high  frequencies  in  each  range,  as  well  as  the  corresponding  amplitudes  of  the 
vertical  bars  used  to  separate  the  intervals. 


Frequency  Range(Hz) 

Vertical  Bars 

Low 

High 

Amplitude 

0 

250 

0.1 

251 

500 

0.25 

501 

1,000 

0.5 

1,001 

2,000 

1.0 

2,001 

3,000 

2.0 

3,001 

4,000 

2.5 

Table  6.1  Frequency  ranges  and  display 

Figure  6.2  illustrates  the  short-time  energy  and  zero-crossing  plots  (top)  from 
Voicedit,  from  the  SPC  Toolbox  [16],  for  the  sentence  “/This  place  blows/”  (bottom). 
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Figure  6.3  presents  four  plots.  The  first  shows  the  time  domain  plot.  The  second  and  third 
plots  show,  respectively,  the  frequency  behavior  according  to  Table  6.1,  and  the  energy 
behavior  obtained  by  summing  the  squares  of  the  coefficients  in  each  interval.  The  fourth 
plot  (bottom)  shows  the  spectrogram  of  the  speech  signal.  Note  that  the  tendency  of  both 
frequency  and  energy  plots  match  those  of  Figure  6.2. 

Voiced-unvoiced  segmentation  obtained  the  best  results  when  all  the  intervals 
with  the  largest  coefficient  positioned  at  a  frequency  below  1,000  Hz,  and  energy  above  a 
certain  threshold  were  assigned  as  voiced.  All  the  intervals  with  the  largest  coefficient  at 
a  frequency  above  1,000  Hz  were  assigned  as  unvoiced.  Figure  6.3  illustrates  that  a 
voiced  sound  results  in  a  high  energy  and  low  frequency  (largest  coefficient  frequency 
below  1,000  Hz)  representation  for  those  segments.  This  is  the  case  for  the  sounds  “/i/,  ” 
“/a/,  ”  and  “/o/.  ”  In  turn,  unvoiced  sounds  are  recognized  as  segments  with  high 
frequency  (largest  coefficient  frequency  above  1,000  Hz)  and  low  energy  content.  This  is 
the  case  of  the  sounds  “Z^/  ”  from  “this”  and  “place.”  Figure  6.4  shows  the  result  of  the 
voiced-unvoiced  segmentation  scheme,  which  can  be  observed  in  the  middle  plot.  The 
bottom  plot  contains  the  corresponding  spectrogram.  Figures  6.5,  6.6,  and  6.7  present  the 
same  kind  of  plots  for  the  sentence  “Be  nice  to  your  sister.”  Again,  the  voiced  sormds 
“/al/,  ”  “/o/,  ”  and  “/i/”  are  distinguishable  from  the  unvoiced  “Zj/,  ”  and  “Zt/.” 


D.  ADAPTIVE  THRESHOLDING 

This  section  utilizes  the  partitioning  of  speech  into  voiced-unvoiced  segments  to 
implement  an  adaptive  scheme  for  selecting  cosine  packet  coefficients. 

Experiments  showed  that  a  more  natural  sounding  speech  was  reconstructed  after 
compression  when  using  more  coefficients  to  represent  voiced  than  imvoiced  segments. 
This  resulted  in  the  use  of  a  different  percentage  of  coefficients  in  the  following  four 
cases; 

A)  Low  frequencies,  low  energy 
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B)  Low  frequencies,  high  energy 

C)  High  frequencies,  low  energy 

D)  High  frequencies,  high  energy. 

The  need  to  select  more  coefficients  to  represent  the  voiced  segments  of  speech  is 
illustrated  in  the  example  where  the  isolated  noise-free  word  “nice”  is  compressed.  As 
explained  in  Section  B,  the  minimum  Avindow  size  chosen  is  16  ms.  Figure  6.8  shows 
that,  when  the  compression  scheme  is  set  to  keep  one  cosine  packet  coefficient  per  16  ms 
window  to  represent  the  phoneme  /i/,  the  higher  formants  of  that  phoneme  are  lost.  As  a 
result,  the  phoneme  /i/  tends  to  sound  like  a  /u/.  This  example  illustrates  the  fact  that 
more  than  one  coefficient  may  be  required  to  represent  voiced  phonemes  accurately. 
Figure  6.9  presents  the  plots  that  result  when  two  CP  coefficients  per  16  ms  window  are 
selected  to  represent  voiced  phonemes  (including  phoneme  /i/),  and  one  CP  coefficient 
out  of  every  16  or  32  ms  interval  is  selected  to  represent  unvoiced  phonemes.  Although  a 
lower  mean  value  is  achieved  for  the  percentage  of  selection  (and,  thus,  a  higher 
compression  rate),  the  sound  /i/  is  correctly  reconstructed  without  affecting  the  other 
phonemes  of  the  word  “nice.” 

Similar  findings  were  obtained  with  other  voiced  phonemes  such  as  /a/  and  /o/.  In 
addition,  experiments  showed  that  the  voiced  plosive  /p/  was  degraded  by  the 
compression  process  and  sounded  like  a  fht.  Keeping  three  cosine  packet  coefficients  per 
16  ms  window  interval  for  voiced  segments  led  to  a  more  accurate  representation  of  the 
information  after  compression,  as  confirmed  by  the  smaller  MSB  and  better  sound  quality 
in  the  reconstructed  signal.  Further  experiments  showed  that  one  cosine  packet 
coefficient  per  16  ms  interval  is  sufficient  to  represent  the  unvoiced  segments  accurately. 


E.  DENOISING 

Previous  sections  have  considered  only  the  problem  of  compressing  noise-free 
signals.  However,  some  of  our  recordings  had  a  significant  amoimt  of  low  frequency 
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equipment  noise  located  aroimd  60  Hz  and  some  of  its  harmonics.  As  a  result,  a  denoising 
step  was  investigated  prior  to  compressing  the  data  to  improve  the  quality  of  the 
compressed  signal.  Thus,  the  noisy  speech  signal  was  denoised  prior  to  applying  the 
compression  scheme.  The  denoising  code  is  given  in  the  Appendix. 

Two  different  cases  where  noise  was  present  were  considered:  Noise-only  data 
segments  and  noisy  speech  segments.  Noise-only  data  segments  occur  before  and  after 
isolated  word  recordings,  and  between  words  in  the  sentence  recordings.  Experiments 
showed  that  the  cosine  packet  coefficients  allowed  the  detection  of  noise-only  segments. 
The  following  two  situations  characterizes  the  noise-only  case  according  to 
implementation  ndencomp.m,  given  in  the  Appendix: 

(1)  Whenever  the  largest  coefficient  in  the  segment  is  at  a  frequency  less 
than  or  equal  to  62.5  Hz,  and  the  second  largest  coefficient  is  at  a  frequency  less  than  or 
equal  to  300  Hz  or  higher  than  1,000  Hz; 

(2)  Whenever  the  largest  coefficient  in  the  segment  is  in  a  frequency  range 
between  62.5  Hz  and  250  Hz,  and  the  second  coefficient  is  at  a  frequency  less  than  200 
Hz. 

The  following  situations  characterizes  the  noise-only  case  according  to  imple¬ 
mentation  encp6.m,  given  in  the  Appendix: 

(1)  The  largest  coefficient  in  the  segment  is  at  a  frequency  less  than  or 
equal  to  125  Hz,  and  the  second  largest  coefficient  is  at  a  frequency  less  than  300  Hz; 

(2)  The  largest  coefficient  is  at  a  frequency  less  than  62.5  Hz,  and  the 
second  coefficient  is  at  a  frequency  higher  than  1,000  Hz; 

(3)  The  largest  coefficient  is  at  a  frequency  less  than  500  Hz  for  the  female 
speaker,  or  less  than  1,000  Hz  for  the  male  speaker,  and  the  second  coefficient  is  at  a 
frequency  less  than  125  Hz. 

All  remaining  cases  are  considered  as  noisy  speech.  For  those  cases,  all 
coefficients  located  at  frequencies  below  or  equal  to  62.5  Hz  are  zeroed  out. 

Three  specific  noise-and-speech  cases  are  presented  as  follows: 
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(1)  Noisy  speech-Case  1.  An  example  of  this  case  is  the  word  “hey,” 
where  the  sound  fhJ  was  lost  in  the  background  noise.  Due  to  the  higher  frequency 
content  of  “/h/”  (as  opposed  to  the  noise),  it  was  possible  to  identify  and  pick  up  one 
more  CP  coefficient  per  interval;  thus,  retrieving  the  sound  of  “/h/.”  This  example  is 
illustrated  in  Figures  6.10  and  6.11,  which  show  time  plots  and  spectrograms  that 
correspond  to  keeping  1  CP  and  2  CP  coefficients/16-  ms  interval,  respectively. 

(2)  Noisy  Speech-Case  2.  This  problem  required  differentiation  of  the 
noise-only  case  from  the  noisy  voiced  stops  lb/ md  Ip/.  Distinguishing  these  sounds  from 
noise  was  easier  than  case  1  above,  since  the  first  largest  coefficient  obtained  for  those 
two  phonemes  was  never  less  than  250  Hz,  making  it  possible  to  denoise  without 
interfering  with  those  sounds. 

(3)  Noisy  speech-Case  3.  There  were  difficulties  in  separating  the  weak 
ending  Isl,  such  as  in  “cats”  and  “let’s,  from  the  background  noise.”  Whenever  this  case 
occurred,  the  Best  Basis  Algorithm  produced  a  32  ms  time  window  with  the  first  two 
largest  coefficients  at  frequencies  less  than  125  Hz.  Although  the  phoneme  “/s/”  is 
located  at  frequencies  higher  than  125  Hz,  its  energy  was  too  small  to  be  differentiated 
from  that  of  the  noise.  Thus,  the  data  contained  in  the  phoneme  Isl  is  identified  as  noise 
only  and  disregarded  before  the  compression  step.  Figure  6.12  illustrates  this  case. 


Time  (seconds) 


Figure  6.2  Sentence  “This  Place  Blows  ”  male  native  speaker;  top  plot:  Short-time 
energy,  zero-crossing  representation;  bottom  plot:  Time  domain  representation,  fs  =  8 
KHz 
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This  place  blows 
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Figure  6.3  Sentence  “This  Place  Blows,”  male  native  speaker,  “compcp”  implementation; 
(a)  Time  domain  plot;  (b)  Frequency  behavior  plot  according  to  Table  6.1;  (c)  Energy 
plot;  (d)  Spectrogram,  using  a  Hanning  time  window  of  length  256  samples  and 
overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz 
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Figure  6.4  Sentence  “This  Place  Blows,”  male  native  speaker,  “compcp”  implementation; 
(a)  Time  domain  plot;  (b)  Voiced-unvoiced  segmentation;  (c)  Spectrogram,  using  a 
Hanning  time  window  of  length  256  samples  and  overlapping  of  128  samples  between 
adjacent  windows,  fs  =  8  KHz 
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Figure  6.6  Sentence  “Be  Nice  to  Your  Sister,”  female  native  speaker,  “compcp” 
implementation;  (a)  Time  domain  plot;  (b)  Frequency  behavior  plot,  according  to  Table 
6.1;  (c)  Energy  plot;  (d)  Spectrogram,  using  a  Hanning  time  window  of  length  256 
samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz 
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Figure  6.7  Sentence  “Be  Nice  to  Your  Sister,”  female  native  speaker,  “compcp” 
implementation;  (a)  Time  domain  plot;  (b)  Voiced-unvoiced  segmentation; 

(c)  Spectrogram,  using  a  Hanning  time  window  of  length  256  samples  and  overlapping 
of  128  samples  between  adjacent  windows,  fs  =  8KHz 
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Figure  6.8  Word  “Nice,”  female  native  speaker,  “ndencomp”  implementation,  fixed 
threholding  with  1%  coefficients  kept  after  compression;  (a)  Original  time  plot; 

(b)  Spectrogram  of  original  time  speech  signal;  (c)  Plot  after  fixed  thresholding  selection 
of  coefficients  is  applied;  (d)  Spectrogram  of  processed  signal.(both  spectrograms  use  a 
Hanning  time  window  of  length  256  samples  and  overlaping  of  128  samples  between 
adjacent  windows,  fs  =  8KHz) 
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Figure  6.9  Word  “Nice,”  female  native  speaker,  “ndencomp”  implementation,  adaptive 
thresholding,  with  an  average  of  0.98%  CP  coefficients  kept  for  compression;  (a)  Original 
time  domain  plot;  (b)  Spectrogram  of  original  speech  signal;  (c)  Time  domain  plot  of 
processed  signal;  (d)  Spectrogram  of  processed  signal.(both  spectrograms  use  a  Hanning 
time  window  of  length  256  samples  and  overlaping  of  128  Samples  between  adjacent 
windows,  fs  =  8  KHz) 
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Normalized  Frequency 


Figure  6.10  Word  “Hey,”  male  non-native  speaker,  “ndencomp”  implementation  (/h/  lost 
after  denoising  scheme  when  it  is  identified  as  noise  only);  (a)  Original  time  domain  plot; 
(b)  Spectrogram  of  original  signal;  (c)  Time  plot  after  denoising/compression  scheme; 

(d)  Spectrogram  after  denoising/compression  scheme,  (both  spectrograms  use  a  Hanning 
time  window  of  length  256  Samples  and  overlapping  of  128  samples  between  adjacent 
windows,  fs  =  8KHz) 
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Figure  6.1 1  Word  “Hey,”  male  non-native  speaker,  “ndencomp”  implementation  (/h/ 
recovered  after  denoising  scheme  when  it  is  identified  as  a  noisy  speech);  (a)  Original 
time  domain  plot;  (b)  Spectrogram  after  denoising/compression  scheme;  (c)  Time  plot 
after  denoising/compression  scheme;(both  spectrograms  use  a  Hanning  time  window  of 
length  256  and  overlaping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz) 
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Figure  6.12  Word  “Cats,”  female  non-native  speaker,  “ndencomp”  implementation  (/s/ 
lost  after  denoising  scheme  when  it  is  identified  as  noise  only);  (a)  Original  time  domain 
plot;  (b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot  after  denoising  / 
compression;  (d)  Spectrogram  after  denoising/compression  (both  spectrograms  use  a 
Hanning  time  window  of  length  256  and  overlapping  of  128  samples  between  adjacent 
windows,  fs  =  8  KHz) 
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VII.  ENCODING  SCHEMES 


This  chapter  is  divided  into  three  main  sections.  The  first  proposes  a  quantization 
scheme  to  transmit  the  CP  coefficients.  The  second  proposes  encoding  schemes  to 
transmit  the  side  information,  i.e.,  both  the  locations  and  the  initial  indexes  of  each 
segment.  The  third  section  presents  the  coding  scheme  used  to  transmit  the  coefficients 
vector,  the  locations  vector  and  the  vector  containing  the  initial  locations  of  each 
segment. 

A.  THE  QUANTIZATION  SCHEME 

Once  data  is  available  for  transmission,  the  user  must  quantize  and  code  it.  After 
the  compression  scheme  proves  to  be  efficient,  and  allows  a  good  quality  reconstruction, 
consideration  is  given  to  finding  a  uniform  quantizer  that  can  reproduce  efficiently  the 
coefficients  to  be  transmitted  [14]. 

Three  different  vectors  must  be  sent  for  speech  compression.  The  first  vector 
contains  the  cosine  packet  coefficients.  The  second  contains  the  location  of  the 
coefficients.  The  third  vector  contains  the  initial  time  locations  of  each  segment.  To 
transmit  the  first  vector,  i.e.,  the  coefficients  vector,  the  following  is  done: 

(1)  The  data  are  normalized  by  dividing  all  the  vectors  by  the  maYimnm 
absolute  value  of  all  the  coefficients.  This  value  turns  out  to  be  the  scaling  factor; 

(2)  The  whole  vector  is  multiplied  by  QL/2  (where  QL  is  the  number  of 
quantizing  levels  selected  by  the  user),  and  rounded  to  the  closest  integer. 

By  performing  these  steps,  a  QL-level  quantizer  is  built.  It  has  QL  levels  due  to 
the  normalization  and  further  multiplication  by  QL/2,  which  assures  that  the  positive  and 
negative  parts  of  speech  will  be  always  between  -QL/2  and  +QL/2. 

(3)  The  scaling  factor,  equal  to  maximum  absolute  value  of  all  the 
coefficients,  is  sent.  In  the  receiver  the  following  steps  are  to  be  performed: 

(a)  Upon  receiving  the  vector,  divide  it  by  QL/2,  recovering  the 
rounded  normalized  coefficients  vector; 
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(b)  Use  the  scaling  factor  to  recover  the  amplitudes  of  the  original 
coefficients.  Even  without  sending  the  scaling  factor,  it  was  possible  to  recover  the 
coefficients  and  thus  reconstruct  the  data.  The  only  difference  is  that  the  data  was  scaled 
in  amplitude  by  a  constant  factor. 

B.  PROPOSED  ENCODING  SCHEMES 
1.  Cosine  packet  coefficient  locations 

To  transmit  the  second  vector,  i.e.,  the  locations  vector,  the  user  first  must  find 
the  least  cost  means  of  transmission.  The  following  example  has  the  sequence  of  a  typical 
location  vector  L: 

L  =  [  1806  1807  1841  1842  1847  1930  1934  1935  2020  2021  2062  2147  2148  218  2192 
2193  2274  2318  2320  2322  2328  2406  2413  2414  2510  . . .] . 

Note  that  there  are  small  differences  between  some  values  in  this  sequence,  while  larger 
jumps  take  place  less  often.  This  is  because  the  small  differences  occur  within  the  same 
segment,  and  the  larger  differences  indicate  a  change  from  one  segment  to  the  adjacent 
one.  Thus,  the  differences  between  successive  locations  are  encoded,  since  they  require  a 
smaller  number  of  bits.  The  differential  locations  vector  correspondent  to  the  locations 
vector  above  is  given  by  the  vector  DL  below: 

DL  =  [  1806  1  34  1  5  83  4  1  85  1  41  85  1  40  4  1  81  44  2  2  6  78  7  1  96  ..] . 

As  a  result  of  sending  the  differences,  it  is  also  necessary  to  send  the 
value  for  the  first  location,  to  allow  for  an  exact  reconstruction  of  the  coefficients 
locations  during  the  decoding  process. 
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2.  Segment  Indexes 

The  third  vector  to  transmit  is  the  vector  that  corresponds  to  the  indexes  of  each 
segment.  The  Best  Basis  Algorithm  selects  the  basis  by  searching  for  the  minirmim 
entropy  representation.  When  the  length  of  each  new  window  is  obtained,  the  algorithm 
outputs  the  two  parameters  “b”  and  “d,”  which  allow  the  beginning  index  of  the  next  time 
window  to  be  computed.  The  expression  for  obtaining  index  “i”  is  as  follows: 

'■  =  ^•"+1.  (7.1) 

where  n  is  the  original  length  of  each  window.  Since  the  parameters  “b”  and  “d”  are  small 
niimbers,  composed  of  one  or  two  digits  and,  therefore,  much  smaller  than  the  indexes 
themselves,  it  is  a  good  idea  to  transmit  the  parameters  instead  of  the  indexes.  Thus,  the 
two  vectors  “nde”  and  “nbe,”  which  are  composed  of  the  parameters  “b”  and  “d”  of  each 
time  "window,  are  transmitted.  For  example,  suppose  the  vectors  nde  and  nbe  are  given  as 
follows: 

nde  =  [4665566566554423665]. 

nbe  =  [045  3 4  10  11  6  14  159  10  11  6726 56 57]. 

Considering  n  =  8192  time  samples,  the  corresponding  vector  I  containing  the 
initial  locations  of  the  first  eight  segments  is  given  by : 

I  =  [  1  512  640  768  1024  1280  1408  1536 ...] . 

To  reconstruct  the  locations  vector  of  the  non-zero  coefficients,  the  receiver  works 
on  the  received  vector  of  differential  locations  DL  and  reconstructs  L.  The  reconstructed 
vector  is  then  called  RL. 

Once  the  locations  of  non  zero  coefficients  (vector  RL)  are  available,  along  with 
the  locations  of  the  beginning  of  each  new  segment  (vector  I),  the  receiver  will  be  able  to 
apply  the  DCT  transform  to  reconstruct  the  speech  signal. 
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C.  CODING  SCHEMES 

After  quantization,  the  coefficients  vector  is  encoded  using  Huffinan  Coding 
[14],  which  minimizes  the  total  number  of  bits  by  assigning  more  bits  to  less  fi-equent 
symbols  and  less  bits  to  more  firequent  ones.  The  vectors  nde  and  nbe  are  transformed 
into  only  one  vector  and  passed  through  the  Huffinan  Coder.  The  inputs  include  the 
number  of  symbols  and  the  probabilities  of  each  one,  whereas  the  outputs  from  the 
Huffman  Coder  are  the  coded  words  and  average  length  of  the  symbols.  In  order  to 
perform  the  quantization  step  and  also  compute  the  probabilities  of  occurrences  of  each 
symbol  to  be  coded,  th6  function  quantx.m,  given  in  the  Appendix,  was  implemented. 
That  function  receives  the  original  vector,  the  number  of  levels  desired  for  quantization, 
and  returns  the  quantized  vector  and  the  probabilities  in  descending  order,  as  required  by 
the  Huffinan  Coder  (the  Huffinan  Coder  used  is  given  in  the  Appendix).  The  code  was 
adapted  as  a  function  to  be  called  whenever  this  step  is  necessary.  Finally,  the  exact 
number  of  bits  necessary  to  encode  the  differential  locations  vector  (DL)  is  computed. 
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Vin.  TESTS  AND  RESULTS 


A.  INTRODUCTION 

This  chapter  describes  the  procedures  that  are  used  to  test  the  compression  and 
encoding  schemes.  First,  the  basic  compression  scheme  results  are  presented.  Next,  the 
combined  denoising/compression  schemes  are  given.  Then,  encoding  performances, 
which  are  used  to  transmit  the  compressed  information,  are  presented.  Finally,  the  Cosine 
Packet  compression  scheme  performances  are  compared  with  those  obtained  using  the  the 
related  Wavelet  Packet  Transform. 

B.  COMPRESSION  SCHEME  RESULTS 

The  compression-only  scheme  is  first  applied  to  “clean”  speech  to  evaluate  its 
performance.  To  test  this  scheme  on  isolated  words,  we  use  the  words  “project,” 
“cataratas,”  and  the  segment  “encyclope,”  extracted  from  the  word  encyclopedia.  This 
compression  scheme  is  also  implemented  in  the  following  two  sentences: 

“  Be  nice  to  your  sister,”  spoken  by  a  female  native  speaker;  and 

“  This  place  blows,”  spoken  by  a  male  native  speaker. 

1.  Description 

The  testing  software  requires  the  user  to  input  the  following: 

(1)  The  gender  of  the  speaker.  This  information  is  required  since  the  pitch 
for  a  female  speaker  occurs  at  a  higher  frequency  than  that  of  a  male  speaker; 

(2)  Word  or  sentence  to  be  compressed; 

(3)  Maximum  depth  used  for  the  cosine  packet  time  splitting,  which  in 
turns  fixes  the  minimum  size  of  the  window; 

The  following  outputs  are  provided: 

(1)  The  mean  square  error  between  the  original  and  the  reconstructed 

speech  signal; 
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(2)  The  number  of  non-zero  cosine  packet  coefficients  in  the  original 
signal  (ONCOEF); 

(3)  The  number  of  non-zero  cosine  packet  coefficients  selected  by  the 
compression  scheme  (FNCOEF); 

(4)  The  reconstructed  speech  signal  obtained  after  compression. 

Two  different  compression  implementations  were  considered,  which  differ  in  the 
number  of  cosine  packet  coefficients  kept  to  compress  the  speech  signal.  The  first 
compression  scheme  (implemented  in  compcp.m,  given  in  the  Appendix)  selects  the 
cosine  packet  coefficients  as  follows: 

(1)  Keep  the  top  0.5%  non-zero  coefficients  (rounded  to  the  closest 
integer)  in  each  time  window  when  the  speech  segment  is  detected  as  unvoiced;  this 
percentage  means  selecting  one  coefficient  out  of  every  interval  containing  128 
coefficients,  one  coefficient  out  of  every  interval  containing  256  coefficients,  and  so  on, 
according  to  the  result  of  the  rounding  process; 

(2)  Keep  the  top  1.3%  non-zero  coefficients  (rounded  to  the  closest 
integer)  for  each  time  window  of  minimum  length  (16  ms)  when  the  speech  segment  is 
identified  as  voiced;  this  means  selecting  two  coefficients  out  of  every  128  coefficients, 
three  coefficients  out  of  every  interval  containing  256  coefficients,  and  so  on; 

(3)  Keep  the  top  2.34%  non-zero  coefficients  (rounded  to  the  closest 
integer)  for  each  time  window  larger  than  16  ms  when  the  speech  segment  is  identified  as 
voiced;  this  means  selecting  three  coefficients  out  of  every  interval  containing  128 
coefficients,  six  coefficients  out  of  every  interval  containing  256  coefficients,  and  so  on. 

The  second  compression  scheme  (implemented  in  necompcp.m  and  given  in  the 
Appendix)  uses  the  following  schemes  to  compress  the  speech  signal: 

(1)  Keep  the  top  0.5%  non-zero  coefficients  (rounded  to  the  closest 
integer)  in  each  time  window  when  the  speech  segment  is  unvoiced;  this  means  selecting 
one  coefficient  out  of  every  interval  containing  128  coefficients,  one  coefficient  out  of 
every  interval  containing  256  coefficients,  and  so  on; 
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(2)  Keep  the  top  1.3%  non-zero  coefficients  (rounded  to  the  closest 
integer)  for  each  time  window  when  the  speech  segment  is  identified  as  voiced;  this 
means  selecting  two  coefficients  out  of  every  interval  containing  128  coefficients,  three 
coefficients  out  of  every  interval  containing  256  coefficients,  and  so  on. 

2.  Experimental  Results 

Results  obtained  for  the  two  compression  schemes  are  presented  in  Table  8.2.  In 
Chapter  VI,  Figures  6.8  and  6.9  present  time  domain  plots  and  spectrograms  for  the 
Adaptive  Thresholding  compression  scheme  considered  in  this  section.  The  parameters 
used  to  measure  degradation  due  to  the  compression  scheme  are: 

(1)  The  mean  square  error  (MSB)  between  the  original  and  the  reconstructed 
speech  signal; 

(2)  A  subjective  evaluation  made  by  five  different  users  of  the  quality  of  the 
reconstructed  signal  when  compared  to  the  quality  of  the  original  signal.  The  evaluation 
was  graded  on  a  scale  from  1  to  5,  according  to  Table  8.1. 


GRADE 

Speech  Quality 

Level  of  Distortion 

5 

Excellent 

Imperceptible 

4 

Good 

Just  perceptible  but  not  aimoying 

3 

Fair 

Perceptible  and  slightly  annoying 

2 

Poor 

Annoying  but  not  objectionable 

1 

Unsatisfactory 

Very  annoying  and  objectionable 

Table  8.1  Mean  opinion  score  table 


(3)  The  ratio  between  the  number  of  non-zero  cosine-packet  coefficients  kept  after 
compression  and  the  total  number  of  initial  non-zero  coefficients  obtained  with  the  cosine 
packet  decomposition  (ONCOEF/FCOEF%). 
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3.  Comments 

Note  tile  slightly  higher  speech  quality  mean  grades  assigned  to  code  compcp.m, 
which  also  presents  a  slightly  higher  percentage  of  coefficients  kept  (i.e.,  a  lower 
compression  ratio).  Experiments  showed  that  fixed  thresholding  selects  1%  of  the  set  of 
coefficients,  and  leads  to  the  distortion  of  voiced  phonemes  (e.g.,  /i/  ends  up  sounding 
like  IvJ  in  the  word  “nice”).  The  “after  compression”  spectrogram  included  in  the  bottom 
right  of  Figure  6.8  showed  that  the  higher  formant  section  of  the  phoneme  /i/  has  not  been 
preserved  in  the  compression.  By  comparison,  Figure  6.9  showed  the  results  obtained 
using  an  adaptive  thresholding  scheme,  which  selects  more  coefficients  for  the  voiced 
segments  while  keeping  a  smaller  total  percentage  of  coefficients  (0.98%).  The  after¬ 
compression  spectrogram  shown  in  Figure  8.2  shows  that  the  high  formants  of  the 
phoneme  /i/  are  better  preserved,  leading  to  a  better  reconstruction  of  the  voiced 
phoneme. 
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Parameters 

^Project^^ 

*^Cataratas” 

“Encyclope** 

“Issos'* 

**Assos^" 

^Be 

Nice^* 

^^This 

Place^^ 

Code:  NECOMP.M 

MSE 

0.0045 

0.0315 

0.0127 

0.0074 

0.0136 

0.0070 

0.0432 

ONCOEF 

8192 

8192 

8192 

8192 

8192 

16384 

16384 

FNCOEF 

124 

100 

104 

100 

100 

210 

216 

% 

ONCOEF/ 

FNCOEF. 

1.51 

1.22 

\21 

1.22 

1.22 

1.28 

1.32 

Speech 

quality 

mean  grade 

2.2 

■ 

2.8 

2.8 

2.2 

3.2 

3.0 

Code:  COMPCP.M 

MSE 

0.0038 

0.0313 

0.0121 

0.0057 

0.0115 

0.005 

2 

0.029 

7 

ONCOEF 

(original  # 
of 

coeff.>0) 

8192 

8192 

8192 

8192 

8192 

16384 

16384 

FNCOEF 

(final  #  of 

coeff.  >0) 

181 

109 

126 

129 

115 

139 

228 

% 

ONCOEF/ 

FNCOEF 

2.21 

1.33 

1.54 

1.57 

1.40 

0.85 

1.39 

Speech 

quality 

mean  grade 

2.6 

2.6 

2.8 

3.0 

2.2 

■ 

3.2 

Table  8.2  Compression  only  results 
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C.  DENOISING-COMPRESSION  RESULTS 

1.  Description 

Next,  we  consider  the  application  of  a  combined  denoising  and  compression 
scheme  designed  to  minimize  the  effects  of  narrowband  equipment  noise  in  a  few  isolated 
words  and  sentences.  The  isolated  words  used  in  these  tests  are: 

“Be”,  spoken  by  a  female  and  by  a  male  speaker; 

“Cats”,  spoken  by  a  female  speaker; 

“Hey”,  spoken  by  a  female  and  by  a  male  speaker; 

“Met”,  spoken  by  a  female  speaker;  and 

“Pay”,  spoken  by  a  female  speaker. 

The  sentences  used  are: 

“Hello,  my  name  is  Roberto,  today  is  Tuesday;”  and 

“Bye,  guys.  I’m  going  back  to  Brazil”,  both  spoken  by  a  male  speaker. 

Two  different  implementations  for  the  denoising  scheme  are  considered:  The  first 
is  implemented  in  ndencomp.m  and  the  second  is  implemented  in  encp6.m  (both  are 
listed  in  the  Appendix).  The  noise  identification  and  denoising  process  for  each 
implementation  can  be  found  in  Chapter  VI,  Section  E,  and  in  the  Appendix.  Details 
regarding  the  compression  scheme  for  both  implementations  can  be  found  in  the 
Appendix.  Table  8.3  presents  the  compression  results  for  tests  applied  on  the  same 
“clean”  words  of  the  previous  section,  but  using  the  codes  ndencomp.m  and  encp6.m. 

2.  Results 

The  parameters  used  to  evaluate  the  denoising/compression  scheme  are  identical 
to  those  defined  for  the  compression-only  scheme,  with  the  exception  of  the  mean  square 
error  (MSE).  This  parameter  was  omitted  because  the  denoising  step  produced  a  greater 
difference  between  the  original  and  the  reconstructed  signals.  The  performance  results  are 
presented  in  Table  8.4. 
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Parameters 

^Project” 

‘‘Cataraias” 

“Encyclope** 

*^Assos^ 

^‘Be 

Nice^^ 

‘^This 

Place^^ 

Code:  NDENCOMP.M 

ONCOEF 

8192 

8192 

8192 

8192 

8192 

16384 

16384 

FNCOEF 

189 

153 

152 

129 

145 

111 

224 

% 

ONCOEF/ 

FNCOEF 

2.31 

1.87 

1.86 

1.87 

1.77 

1.69 

1.37 

Speech  quality 

mean  grade 

2.5 

3.0 

3.0 

3.2 

2.6 

3.3 

3.2 

Code:  ENCP6.M 

ONCOEF 

8192 

8192 

8192 

8192 

8192 

16384 

16384 

FNCOEF 

188 

149 

152 

129 

143 

270 

264 

% 

ONCOEF/ 

FNCOEF 

2.29 

1.82 

1.86 

1.86 

1.75 

1.65 

1.61 

Speech  quality 

mean  grade 

2.5 

3.3 

3.2 

■ 

2.8 

3.5 

■ 

Table  8.3  Compression  results  utilizing  codes  ndencomp.m  and  encp6.m 


The  speech  quality  mean  grade  was  computed  for  the  following  speech  Hatg-  The 
words  “Be  ”  female  speaker,  and  “pay,”  male  speaker,  and  the  sentences  “Hello,  my  name 
is  Roberto  ...”  and  “Bye,  guys.  I’m  going  back  ...”  These  results  are  presented  in  Table 
8.5. 
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Table  8.4  Denoising/compression  results 


SPEECH 

ndencomp 

encp6 

“Be”  (female  speaker) 

2.2 

“Pay”  (male  speaker) 

2.6 

2.6 

“Hello,  my  name  is ...” 

3.2 

3.2 

“Bye,  guys ...” 

2.4 

2.6 

Table  8.5  Speech  quality  mean  grades 
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3.  Comments 


We  note  that  the  overall  speech  quality  mean  grades  in  Table  8.3  are  slightly 
increased  when  compared  to  those  from  Table  8.2.  We  also  note  that  the 
ONCOEF/FNCOEF  percentages  in  Table  8.4  are  small,  since  large  sections  of  data  are 
identified  as  noise-only.  Thus  they  are  not  retained  for  compression  by  the  denoising  step. 
Results  obtained  for  both  denoising/compression  schemes  show  slightly  better  speech 
quality  for  the  encp6.m  implementation  than  for  the  ndencomp.m  implementation. 

a.  Word  ‘‘be” 

Both  denoising/compression  schemes  produce  good  results  for  the  word  “be” 
for  male  and  female  speakers.  The  plots  in  Figure  8.1  show  the  efficiency  of  the 
algorithm  in  both  the  time  and  frequency  domain.  The  quality  of  the  reconstructed  speech 
is  good,  as  illustrated  by  the  grades  assigned  by  five  native  listeners. 

b.  Word  “Hey” 

For  male  and  female  cases,  both  denoising/compression  schemes  produce 
good  results  (Figures  8.2  and  8.3).  The  quality  of  the  reconstructed  speech  is  high,  as 
confirmed  by  the  listening  tests.  Note  that  the  /h/  sound  in  the  female  speech  has  a  higher 
frequency  than  that  of  the  male  voice.  The  denoising  schemes  also  allow  the  phoneme  /h/ 
to  be  differentiated  from  the  noisy  background  environment. 

d.  Word  “met” 

Both  denoising/compression  schemes  produce  a  good  reconstruction  of 
“/me/”  and  a  poor  reconstruction  of  the  phoneme  /t/,  which  is  reconstructed  sounding  like 
a  “/d/.”  This  degradation  is  due  to  the  combination  of  too  few  coefficients  kept  for 
compression  in  this  section  of  the  word  and  a  noisy  background. 
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e.  Word  **Pay** 

The  reconstructed  sound  quality  is  better  for  the  male  case  (Figures  8.4  and 
8.5).  Again,  this  was  due  to  too  few  coefficients  being  kept.  The  soimd  /p/  has  its  first  two 
largest  CP  coefficients  below  400  Hz  and  around  1,000  Hz,  respectively.  Higher  energy  is 
concentrated  in  the  lower  frequency  coefficients.  When  spoken  by  a  female,  the  higher 
frequency  coefficients  get  less  energy  compared  to  others  not  so  important  from  the  lower 
frequency  portion.  For  that  reason,  fewer  coefficients  from  the  higher  frequency  portion 
are  kept,  leading  to  a  poorer  soimd  than  the  male  version. 

f.  Sentence  “Hello,  my  name  is  Roberto,  today  is  Tuesday** 

The  spectrograms  in  Figure  8.6  show  that  the  main  signal  energy  is  preserved, 
and  that  denoising  occurs  in  the  correct  time  intervals. 

g.  Sentence  “  Bye,  guys,  I*m  going  back  to  Brazil  ** 

For  this  sentence,  both  denoising-plus-compression  schemes  result  in  a  good 
reconstruction.  It  is  possible  to  observe  in  the  spectrograms  of  Figure  8.7  that  the 
algorithm  picks  up  the  important  cosine  packet  coefficients.  In  this  case,  no  significant 
amount  of  denoising  was  done  due  to  the  high  quality  of  the  original  speech.  However,  it 
is  worth  comparing  the  effects  of  the  denoising  schemes.  Note  that,  in  using  ndencomp.m 
the  resultant  signal  is  divided  more  by  noisy  intervals  than  when  using  encp6.  In  the 
listening  tests  for  both  sentences,  the  mean  grade  assigned  to  the  reconstruction  using 
ndencomp.m  is  better  than  the  one  assigned  when  using  encp6.m.  Basically,  the  unvoiced 
sounds  had  a  better  reconstruction  using  the  former  code,  whereas  the  latter  code 
produced  some  distortion,  leading  to  what  was  called  a  mechanical  sound  by  some 
listeners. 
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D.  ENCODING  SCHEMES  RESULTS 


1.  Description 

The  data  used  for  these  tests  consist  of  twelve  speech  sequences  of  lengths  8192 
(ten  words  and/or  sounds),  32768  (one  sentence)  and  65536  (one  dialogue).  The  software 
used  for  this  set  of  tests  includes  voiced-unvoiced  segmentation,  denoising,  compression, 
and  encoding  steps.  Both  denoising/compression  schemes  are  used  in  these  tests.  The 
coding  software  is  presented  in  the  Appendix.  The  minimum  window  size  is  16 
miliseconds.  The  compression  ratio  between  the  total  number  of  bits  after  encoding  and 
the  total  original  number  of  bits,  is  iised  to  evaluate  the  performance  of  the  encoding 
scheme.  The  original  number  of  bits  is  computed  by  multiplying  the  number  of  bits  used 
to  represent  each  incoming  sample  (the  samples  had  8  bits  and  were  PCM  compressed) 
by  the  original  number  of  samples.  For  example,  for  each  of  the  ten  sequences  of  length 
8192,  the  original  number  of  bits  is  8192*  8  =  65,536  bits  per  speech  sequence.  The 
following  speech  sequences  are  used  in  our  tests: 

(a)  “BE,”  spoken  by  a  female  speaker; 

(b)  “HEY,”  spoken  by  a  female  speaker; 

(c)  “MET,”  spoken  by  a  female  speaker; 

(d)  “PAY,”  spoken  by  a  female  speaker; 

(e)  “CATS,”  spoken  by  a  female  speaker; 

(f)  Word  “PROJECT,”  spoken  by  a  male  speaker; 

(g)  Word  “CATARATAS,”  spoken  by  a  male  speaker; 

(h)  Sound  or  partial  word  “ENCYCLOPE”,  spoken  by  a  male  speaker; 

(i)  Soimd  “ASSOS,”  spoken  by  a  male  speaker; 

(j)  Soimd  “ISSOS,”  spoken  by  a  male  speaker; 

(k)  Sentence  “Bye  guys.  I’m  going  back  to  Brazil,”  male  speaker; 

(l)  Dialogue  from  a  telephone  conversation,  male  and  female  speakers. 
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2.  Results 


A  measure  of  distortion  is  obtained  by  comparing  the  quality  between  the  original 
speech  signal  and  the  reconstructed  signal.  (“Speech  quality”  in  Table  8.1). 

To  evaluate  the  efficiency  of  the  encoding  scheme,  the  following 
parameters  are  chosen: 

(1)  The  COMPRATIO,  defined  as  one  minus  the  ratio  between  the  total 
number  of  bits  after  compression  and  the  total  number  of  bits  in  the  original  signal;  v 

(2)  The  mean  square  value  of  the  quantization  error. 

The  performance  results  for  the  encoding  scheme  are  presented  in  the  Table  8.6. 
All  results  are  based  on  the  denoising/compression  implementation  ndencomp.m,  except 
for  the  words  “hey”  and  “met,”  which  use  encp6.m. 


SPEECH 

SPEECH 

QUAL 

COMPRATIO 

% 

MSE 

“Be” 

2.6 

98.70% 

5.12e'’ 

“Hey” 

3.2 

98.56% 

9.32e^ 

“Met” 

3.2 

98.85% 

5.93e'® 

“Pay” 

3.0 

99.17% 

5.22e‘’ 

“Cats” 

3.4 

98.59% 

l.Ole’^ 

“Project” 

3.2 

97.87% 

1.06e‘* 

“Cataratas” 

3.4 

97.65% 

4.61e'* 

“Encyclope” 

3.8 

97.64% 

2.86e"‘ 

“Assos” 

3.2 

97.87% 

3.62e’^ 

“Issos” 

4.0 

98.05% 

2.16e'^ 

“Bye,  guys..  ” 

2.4 

98.10% 

2.707e'^ 

Tel.  conversation 

2.6 

98.06% 

2.843e’^ 

Table  8.6  Encoding  results,  64-level  quantizer. 
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The  mean  speech  quality  grade  assigned  is  3.2  (see  MOS  Table  8.1).  This  means 
perceptible  and  slightly  annoying.  We  also  note  the  high  values  of  compression  ratios  anH 
the  very  small  values  of  MSB,  which  corresponds  to  the  mean  square  quantization  error. 

The  compression  ratio  is  calculated  in  the  following  way.  Eight  bits  are  ;ised  for 
each  original  sample  of  data,  since  that  is  the  number  used  to  load  speech  recordings  into 
“Matlab.”  The  total  number  of  bits  is  computed  by  multiplying  each  average  number 
from  the  Huffman  coder  by  the  corresponding  number  of  samples  in  the  coefficients 
vector,  as  well  as  in  the  three  vectors  used  to  transmit  the  locations  and  the  window 
boundaries.  The  compression  ratio  is  then  computed  as  the  ratio  between  the  final  total 
number  of  bits  and  the  original  total  mumber  of  bits  after  the  encoding  process. 
Comparing  the  percentages  from  this  encoding  table  to  the  ones  from  the  previous 
sections(compression  and  denoising/compression),  we  note  that,  although  still  very  low, 
the  numbers  from  the  encoding  process  are  higher  ( ~  2%)  in  comparison  to  the  others  (~ 
0.85%).  The  reason  is  that,  in  addition  to  the  cosine  packet  coefficients,  the  side 
information  (i.e.,  the  locations  of  those  coefficients)  must  also  be  encoded.  Thus,  even 
though  the  number  of  bits  is  reduced  due  to  the  quantization  process,  the  increase  of 
information  to  be  transmitted  makes  the  number  a  little  higher. 

As  can  be  noted  from  the  grades  assigned,  the  encoding  process  results  are  good. 
The  only  problem  are  the  low-energy  coefBcients  corresponding  to  unvoiced  soimds 
when  submitted  to  the  quantization  and  rounding  processes.  Figure  8.8  shows  the  word 
“project,”  which  lost  its  weak,  final  /kt/.  Even  when  we  change  the  quantizer  to  32  and  64 
levels,  it  is  still  impossible  to  recover  the  final  sound. 

Figure  8.9  presents  the  sentence,  “Be  nice  to  your  sister,”  using  a  16-level 
quantizer.  We  note  that  the  sounds  /s/  in  “nice”,  /t/  in  “to,”  and  /r/  in  “your”  are  lost. 
However,  when  the  quantizer  is  changed  to  32  levels,  the  main  parts  of  these  sounds  are 
recovered  (Figure  8.10).  Finally,  when  the  number  of  levels  is  increased  to  64  (Figure 
8.11),  the  sequence  sormd  is  totally  reconstructed,  and  practically  no  difference  is  noted 
between  the  original  and  reconstructed  sounds. 
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Similar  results  are  observed  for  the  sentence,  “Bye  guys,  Fm  going  back  to 
Brazil.”  The  phoneme  Izl  from  “guys”  is  lost  with  a  16-level  quantizer  (Figure  8.12),  and 
is  recovered  with  a  32-level  quantizer  (Figure  8.13).  Similarly,  when  a  16-level  quantizer 
was  applied,  the  sound  /h/  in  the  word  “hey”  was  reconstructed  like  a  /k/,  resulting  in  a 
word  sounding  like  “kay”  (Figure  8.14).  By  changing  to  a  32-level  quantizer,  it  was 
possible  to  recover  the  correct  soxmd  (Figure  8.15).  The  sound  was  even  better  with  a  64- 
level  quantizer  (Figure  8.16).  Note  the  sequential  progress  in  the  coefficients  recovered  in 
Figures  8.14  through  8.16,  by  comparing  the  plots  (d)  and  (f). 

Two  points  are  worth  mentioning.  First,  after  the  number  of  quantizing  levels  is 
doubled,  the  compression  ratio  does  not  decrease  significantly.  For  example,  for  the  word 
“project”  (with  a  higher  SNR,  an  almost  “clean”  word),  when  the  number  of  levels  is 
increased  fi-om  16  to  32  (i.e.  changing  from  4  to  5  bits/symbol),  the  compression 
percentage  changes  from  98.36%  (1:61.2)  to  98.28%  (1:58.4).  Another  example  is  the 
word  “hey”  (also  a  high  SNR).  The  three  compression  percentages  corresponding  to  the 
16-level,  32-level,  and  64-level  quantizers  are,  respectively,  99.23%  (1:130.4),  99.16% 
(1:119.2),  and  99.11%  (1:113.2),  respectively.  Thus,  a  64-level  quantizer  is  used  as  a 
good  compromise  between  quality  and  compression. 


E.  COMPARISON  WITH  WAVELET  PACKET  TRANSFORM 

In  this  section,  the  Cosine  Packet  is  compared  to  the  Wavelet  Packet-based 
compression  procedure  in  clean  (high  SNR)  speech.  A  “clean”(  high  SNR)  speech 
sequence  is  chosen,  and  the  results  are  compared  up  to  only  the  compression  scheme, 
since  the  encoding  scheme  performs  basically  the  same  for  both  cases. 

The  sentence  “Be  nice  to  your  sister”  is  compressed  using  the  Cosine  Packet 
Transform,  and  the  average  percentage  of  non-zero  coefficients  selected  from  the  original 
nrunber  equals  0.85%  for  a  good  reconstruction  of  the  speech.  A  much  poorer 
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reconstruction  results  from  the  Wavelet  Packet  Transform  using  the  “Daubechies(4)” 
wavelet  basis  function.  The  WT  is  implemented  using  the  same  criteria  as  those  defined 
for  the  CPT  implementation  with  the  WaveLab  Package  [17]. 

The  result  obtained  for  this  sentence  can  be  analyzed  through  the  time  and 
fi-equency  plots  for  both  schemes  given  in  Figures  8.17  and  8.18.  We  note  that,  in  the 
Wavelet  Packet  Transform,  there  are  “holes”  in  the  time  domain.  We  note  also  that  those 
“holes”  happen  to  be  exactly  at  the  intervals  where  the  energy  is  lower,  i.e.,  mainly  at  the 
unvoiced  sounds.  This  is  because  the  WPT  scheme  initially  splits  the  signal  into  given 
firequency  windows.  In.  our  example,  only  the  highest  15%  coefficients  for  given 
fi-equency  ranges  are  selected  during  the  whole  period  of  time. 

By  comparison,  the  CPT  splits  the  signal  first  in  the  time  domain.  Then,  for  each 
time  frame,  a  thresholding  is  applied  for  the  cosine  packet  coefficients.  As  a  result, 
although  many  fewer  coefficients  are  selected,  there  is  no  chance  of  having  a  time 
interval  not  represented.  Actually,  in  this  scheme,  the  holes  are  in  the  fi-equency  domain. 
But,  since  the  transform  is  good  enough  to  detect  the  main  fi:equencies  contained  in  each 
locally  stationary  portion  of  the  signal,  the  few  cosine  packet  coefficients  preserved  at 
each  time  interval  are  sufficient  to  allow  for  a  good  reconstruction  of  the  speech.  These 
results  confirm  the  theoretical  expectation  of  superiority  of  the  CPT  over  the  WPT  for 
speech  signal  compression  applications. 
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Normalized  Frequency 


a)"BE",  male  speaker 


b)ORIGINAL  SPECTROGRAM 


Time  samples 


c)AFTER  DENOISING/COMPRESSION 


d)SPECTROGRAM(AFTER) 


Time  samples 


Figure  8.1  Word  “Be,”  male  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  plot;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression  (both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  windows,  fs  =  8  BCHz) 
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Normalized  Frequency 


a)"HEY",  male  speaker  c)AFTER  DENOISING/COMPRESSION 


Time  samples  Time  samples 


Figure  8.2  Word  ‘‘‘‘Hey,''  male  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;(b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression  (both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  windows, fs  =  8  KHz) 
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Normalized  Frequency 


a)"HEY",  female  speaker 


c)AFTER  DENOISING/COMPRESSION 
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b)ORIGINAL  SPECTROGRAM  d)SPECTROGRAM(AFTER) 


Time  samples  Time  samples 


Figure  8.3  Word  ''Hey”  female  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression  (both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  windows,  fs  =  8  KHz) 
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Normalized  Frequency 


a)”PAY",  female  speaker 


b)ORIGINAL  SPECTROGRAM 


Time  samples 


c)AFTER  DENOISING/COMPRESSION 
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d)SPECTROGRAM(AFTER) 
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Figure  8.4  Word  “Pajy,”  female  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression  (both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  windows,  fs  =  8  KHz) 
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Normalized  Frequency 


a)"PAY",  male  speaker 


c)AFTER  DENOISING/COMPRESSION 
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b)ORIGINAL  SPECTROGRAM  d)SPECTROGRAM(AFTER) 
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Figure  8.5  Word  “Pay”  male  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression  (both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  vdndows,  fs  =  8  KHz) 
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Normalized  Frequency 


a)"Hello,  my  name  is...",  male  speaker  c)AFTER  DENOISING/COMPRESSION 


Time  samples  xIO^*  Time  samples  xIO** 


b)ORIGINAL  SPECTROGRAM  d)SPECTROGRAM(AFTER) 


Figure  8.6  Sentence  “Hello,  my  name  is  Roberto,  today  is  Tuesday,”  male  non-native 
speaker,  “ndencomp”  implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of 
original  speech;  (c)  Time  domain  plot  after  denoising/compression;  (d)  Spectrogram  after 
denoising/compression(both  spectrograms  use  a  Hanning  time  window  of  length  256 
samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz) 


77 


Normalized  Frequency 


a)"Bye,  guys,  I  am  going...",  male  speaker 
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Figure  8.7  Sentence  '"‘Bye,  guys,  I’m  going  back  to  Brazil,”  male  non-native  speaker, 
“ndencomp”  implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of 
original  speech;  (c)  Time  domain  plot  after  denoising/compression;  (d)  Spectrogram 
after  denoising/compression  (both  spectrograms  use  a  Hanning  time  window  of  length 
256  samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8KHz) 
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Figure  8.8  Word  “Pro/ec/,”  male  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Time  domain  plot 
after  denoising/compression;  (d)  Spectrogram  after  denoising/compression;  (e)  Time 
domain  plot  after  decoding,  16-level  quantizer;  (f)  Spectrogram  after  decoding,  16-level 
quantizer  (both  spectrograms  use  a  Hanning  time  window  of  length  256  samples  and 
overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz) 
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Figure  8.9  Sentence  nice  to  your  sister,”  female  native  speaker,  “ndencomp” 
implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech; 

(c)  Time  domain  plot  after  denoising/compression;  (d)  Spectrogram  after  denoising  / 
compression;(e)  Time  domain  plot  after  decoding,  16-level  quantizer;  (f)  Spectrogram 
after  decoding,  16-level  quantizer  (both  spectrograms  use  a  Hanning  time  window  of 
length  256  samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8 
KHz) 
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Figure  8.10  Sentence  “Be  nice  to  your  sister”  female  native  speaker,  “ndencomp” 
implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech; 

(c)  Time  domain  plot  after  denoising/compression;  (d)  Spectrogram  after  denoising/ 
compression;  (e)  Time  domain  plot  after  decoding,  32-level  quantizer;  (f)  Spectrogram 
after  decoding,  32-level  quantizer  (both  spectrograms  use  a  Hanning  time  window  of 
length  256  samples  and  overlapping  of  128  samplesbetween  adjacent  windows,  fs  =  8 
KHz) 
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Figure  8.1 1  Sentence  nice  to  your  sister,”  female  native  speaker ,“ndencomp” 
implementation;  (a)  Original  time  domain  plot;(b)  Spectrogram  of  original  speech; 

(c)  Time  domain  plot  after  denoising/compression;  (d)  Spectrogram  after  denoising/ 
compression;(e)  After  decoding,  64-level  quantizer;  (f)  After  decoding,  64-level 
quantizer(both  spectrograms  use  a  Hanning  time  window  of  length  256  samples  and 
overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  BCHz) 
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Figure  8.12  Sentence  “Bye,  guys,  I’m  going  back  to  Brazil”  male  non-native  speaker, 
“ndencomp”  implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of  original 
speech;  (c)  Plot  after  denoising/compression;  (d)  Spectrogram  after  denoising  / 
compression;  (e)  Time  domain  plot  after  decoding,  16-level  quantizer;  (f)  Spectrogram 
after  decoding,  16-level  quantizer  (both  spectrograms  use  a  Hanning  time  window  of 
length  256  samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8 
KHz) 


83 


Normalized  Frequency 


a)Original  "Bye, guys,..." 


Time  samplj^^g-* 


Time  sampl|^Q4 


Time  sampl(^^Q4 


d)After  Denoising/Compression 


Time  sampl^^Q4 


f)After  Decoding 


0  1  2 
Time  sampl^^ 


O' 


Figure  8.13  Sentence  “Bye,  guys,  I’m  going  back  to  Brazil ^  male  non-native  speaker, 
“ndencomp”  implementation;  (a)  Original  time  domain  plot;  (b)  Spectrogram  of  original 
speech;  (c)  Plot  after  denoising/compression;  (d)  Spectrogram  after  denoising  / 
compression;  (e)  Time  domain  after  decoding,  32-level  quantizer;  (f)  Spectrogram  after 
decoding,  32-level  quantizer  (both  spectrograms  use  a  Hanning  time  window  of  length 
256  samples  and  overlapping  of  128  samples  between  adjacent  windows,  fs  =  8  KHz) 
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Figure  8.14  Word  *'^Hey”  female  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Plot  after 
denoising/compression;  (d)  Spectrogram  after  denoising/compression;  (e)  Time  domain 
plot  after  decoding,  16-level  quantizer;  (f)  Spectrogram  after  decoding,  16-Ievel  quantizer 
(both  spectrograms  use  a  Harming  time  window  of  length  256  samples  and  overlapping  of 
128  samples  between  adjacent  windows,  fs  =  8KHz) 
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Figure  8.15  Word  '‘'Hey”  female  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Plot  after 
denoising/compression;  (d)  Spectrogram  after  denoising/compression;(e)  After  decoding, 
32-level  quantizer;  After  decoding,  32-level  quantizer  (both  spectrograms  use  a 
Hanning  time  window  of  length  256  samples  and  overlapping  of  128  samples 
between  adjacent  windows,  fs  =  8  KHz) 
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Figure  8.16  Word  “Hey”  female  non-native  speaker,  “ndencomp”  implementation; 

(a)  Original  time  domain  plot;  (b)  Spectrogram  of  original  speech;  (c)  Plot  after 
denoising  /compression;  (d)  Spectrogram  after  denoising/compression;(e)  After  decoding, 
64-level  quantizer;  (f)  After  decoding,  64-level  quantizer  ( both  spectrograms  use  a 
Hanning  time  window  of  length  256  samples  and  overlapping  of  128  samples  between 
adjacent  windows,  fs  =  8  KHz) 
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Figure  8.17  Sentence  nice  to  your  sister"  female  native  speaker,  compressed 

with  the  CPT,  “ndencomp”  implementation;  (a)  Original  time  domain  plot; 

(b)  Spectrogram  of  original  speech;  (c)  Plot  after  denoising/compression  with  0.85%  non¬ 
zero  coefficients  selected;  (d)  Spectrogram  after  denoising/compression  ( both 
spectrograms  use  a  Hanning  time  window  of  length  256  samples  and  overlapping  of  128 
samples  between  adjacent  windows,  fs  =  8  KHz) 
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Figure  8.18  Sentence  ‘‘’’Be  nice  to  your  sister"  female  native  speaker,  compressed 
withWPT,  using  a  “Daubechies”  basis  function;  (a)  Original  time  domain  plot; 

(b)  Spectrogram  of  original  speech;  (c)  Plot  after  denoising/compression  with  15%  non¬ 
zero  coefficients  selected;  (d)  Spectrogram  after  compression  ( both  spectrograms  use  a 
Hanning  time  window  of  length  256  samples  and  overlapping  of  128  samples  between 
adjacent  windows,  fs  =  8  KHz) 
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rx.  CONCLUSION 


In  this  thesis,  compression  schemes  based  on  the  Cosine  Packet  Transform  using 
the  Local  Cosine  Transform  are  presented.  The  basis  functions  are  chosen  via  the  Best 
Basis  Algorithm  using  the  entropy  minimization  criterion. 

Coefficients  for  compression  are  chosen  with  an  adaptive  scheme,  which  selects 
more  cosine  packet  coefficients  for  voiced  intervals  than  for  unvoiced  ones.  In  addition, 
since  some  recorded  speech  sounds  have  equipment  noise,  a  denoising  scheme  is 
performed. 

Finally,  an  encoding  scheme  is  implemented.  Thus,  this  study  simulates  the  entire 
process  of  denoising,  compression,  and  encoding  (on  the  transmitter  side),  as  well  as 
decoding  and  reconstruction  (on  the  receiver  side). 

The  results  obtained  are  good,  due  to  the  combination  of  certain  factors,  which 
include  the  following: 

(a)  Good  time  and  frequency  resolution  of  the  local  cosine  transform; 

(b)  The  Cosine  Packet  Transform,  combined  with  the  Best  Basis  algorithm  using 
the  entropy  minimization  criterion  allowed  not  only  for  minimizing  the  entropy,  but  also 
for  the  splitting  of  the  signal  into  its  locally  stationary  portions.  These  two  factors  greatly 
contribute  to  the  success  of  the  compression  scheme; 

(c)  The  Adaptive  Thresholding  scheme  helps  to  optimize  in  quality  and  quantity 
the  number  of  cosine  packet  coefficients,  while  preserving  good  compressed  signal 
properties; 

(d)  The  denoising  scheme  allows  the  number  of  non-zero  coefficients  to  be 
reduced  and,  at  the  same  time,  a  better  quality  of  denoised  sound  when  compared  to  the 
original  noisy  speech. 

Through  the  denoising  attempts,  it  is  possible  to  recognize  some  patterns  of 
speech  that  would  be  hidden  by  the  higher  energy  noise  in  regular  compression.  The 
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frequency  analysis  allows  differentiating  speech  sounds  and  background  noise  and  hence 
permits  recovering  most  of  the  speech  soimds. 

Basically,  two  main  problems  remain.  First,  there  are  a  few  sounds  with  low 
energy  that  need  to  be  correctly  identified  and  recovered  from  the  background  noise.  In 
the  experiments  with  noisy  speech,  the  only  cases  that  could  not  be  solved  are  the  weak 
unvoiced  endings,  like  IXl  at  the  the  word  “met,”  (which  is  reconstructed  like  a  !&)  and  /s/ 
at  the  end  of  words  “cats”  and  “lets”,  which  is  lost  due  to  the  denoising  process.  Although 
many  phonemes  were  tried,  there  are  probably  some  others  that  could  have  been 
attempted  and,  thus,  this  is  a  suggestion  for  further  study.  The  second  problem  that  was 
encountered  is  quantization  noise.  Although  the  encoding  scheme  works  well  enough  to 
make  speech  recognition  for  many  cases  in  the  simulated  receiver  sounding  “cleaner” 
than  the  original  noisy  signal,  noise  is  introduced  by  the  quantization  process.  Although 
very  small,  this  noise  is  enough  for  cancelling  endings  like  /kt/  in  the  word  “project”. 
Since  this  research  focused  on  the  compression  schemes,  less  effort  is  made  to  develop  a 
better  quantizing  and  encoding  schemes  (another  point  for  further  study). 

The  CPT  performs  better  than  the  WPT  for  speech  compression  applications. 
When  using  the  WPT,  the  compression  scheme  begins  losing  low  energy  sounds  much 
earlier  than  the  CPT,  i.e.,  with  a  much  lower  compression  ratio,  although  this  may  be  due 
to  the  basis  function  that  was  selected. 

The  purpose  of  this  study  is  to  find  an  optimal  scheme  for  the  compression  of 
speech  signals.  Since  the  scheme  used  in  this  study  is  successful,  speech  samples  with  the 
highest  possible  compression  ratios  are  tested.  The  quality  reconstruction  that  results  for 
the  majority  of  tries  can  be  considered  as  “fair”  (see  Table  8.1),  as  shovra  by  the  average 
mean  grades  assigned.  The  very  small  percentages  of  selected  coefficients  in  the 
compression  scheme  result  tables,  and  very  high  compression  ratios  for  the  encoding 
results,  together  with  a  “fair”  quality  reconstruction  indicate  a  positive  overall  result.  The 
compression  ratios  are  not  fixed,  since  the  scheme  is  adaptive  to  the  speech  being 
analyzed.  However  our  results  indicate  an  average  compression  ratio  of  1:50  on  the 
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speech  vised  in  our  study.  The  ratio  can  be  adjusted  for  better  quality  of  reconstructed 
sound,  according  to  the  needs  and  availability  of  the  user.  Evidently,  there  will  always  be 
a  need  to  compromise  between  the  compression  ratio  and  the  quality  of  the  reconstructed 
speech. 
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APPENDIX.  COMPUTER  CODE 


%  Name:  Compq).m  and  necompcp.m 

%  Subject:  Analysis,  Compression  and  Synthesis  routine  of  speech  data 
%  Desccription: 

%  These  two  routines  contain  the  following  main  parts: 

%  a)  Input  and  loading  of  speech  to  be  used  (  prompts  the  user  for  choices  like  gender  of 
%  speaker,  word  or  sentences  among  those  available  and  finest  depth  for  time  splitting); 

%  b)  Implements  the  Cosine  Packet  Transform  (CPT)  of  the  speech  sequence; 

%  c)  Chooses  the  basis  for  the  CPT  by  applying  the  Best  Basis  Algorithm; 

%  d)  Implements  a  Frequency  Behavior  and  an  Energy  Behavior  plot; 

%  e)  Implements  a  voiced-unvoiced  segmentation; 

%  f)  Selects  the  coefficients  by  applying  the  Adaptive  Thresholding  scheme; 

%  g)  Applies  the  inverse  CPT,  by  transforming  each  interval,  unfolding  and  adding  it  to  the 
%  existing  sequence; 

%  h)  Computes  and  presents  the  number  of  non-zero  coefficients  before  and  after  the 
%  compression  scheme  as  well  as  the  mean  square  error  between  the  original  and  the 
%  reconstructed  sequences; 

%  i)  Presents  plots  containing  the  Frequency  as  well  as  the  Energy  behavior;  also  presents  the 
%  voiced-unvoiced  segmentation  plot  as  well  as  time  domain  and  spectrogram  plots  of  both 
%  original  and  reconstructed  sequences; 

%  Notel:  Parts  b),  c)  and  g)  are  extracted  from  the  software  package  Wavelab.600,  Stanford 
%  University[17].  This  is  also  valid  for  die  programs  encp6.m,  ndencompm, 

%  encptourm  and  ndentourm; 

%  Note  2:  WaveLab  code  was  modified  to  implement  our  compression  schemes. 

%  Written  and  adapted  by  J.  Roberto  V.  Martins,  in  October  1995. 

%  Compcp.m 

%  Input  and  loading  of  speech  to  be  used 
clear; 

V  =  input('Please  enter  "1"  for  female  voice  and  "2"  for  a  male  voice : '); 
ifV=l 
P  =  2; 

FV  =  input('Please  enter  1  for  the  sentence,  2  for  "be" ,  3  for  "hate",  4  for  "hey" ,  5  for  "met" ,  6  for 
"pay",  7  for  "cats",  8  for  "benice" : '); 
ifFV=l 
clear  ny; 
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load  fse; 

ny  =  [fse'  zeros(l,5120)]; 

elseifFV==2 
clear  ny; 
load  fbe; 

ny  =  [fbe*  zeros(l,2048)]; 
elseifFV— 3 
clear  ny; 
load  fha; 

ny  =  [fha'  zeros(l,7168)]; 
elseifFV==4 
clear  ny; 

load  fhey;  • 

ny  =  [fhey']; 
elseifFV==5 
clear  ny 
load  finet; 

ny  =  [finef  zeros( 1 , 1 024)] ; 
elseifFV=6 
clear  ny 
load  ^ay 

ny  =  []^ay'  zeros(l,1024)]; 
elseifFV==7 
clear  ny 
load  feats 
ny  =  [  feats']; 
elseifFV=8 
clear  ny 

benice  =  loadwav('benice.wav'); 
ny  =  [(benice(l:16384)/max(abs(benice))+0.01 19)']; 
end 
end 

ifV=2 
P  =  2; 

W  =  inputCPlease  enter  l,for  "project",2  for  "cataratas",3  for  "encyclopedia",  4  for  "issos",5  for  "assos",6 
for  "six", 7  for  "the  sentence",8  for  "aka",9  for  "at",10  for  "azure",l  1  for  "be",12  for  "bird",13  for  "boot", 14 
for  "call", 15  for  "day",16  for  "eka",17  for  "epa",  18  for  "eve",19  for  "fa1her",20  for  "foot",  21  for  "for",  22 
for  "go",  23  for  "hate",  24  for  "he",25  for  "ika",26  for  "it",27  for  "key",28  for  "let",29  for  "me",30  for 
"met",31  for  "no",32  for  "obey",33  for  "opa",34  for  "pay",35  for  "read",36  for  "see",37  for  "she",38  for 
"then",39  for  "thin" ,  40  for  "to",41  for  "up",  43  for  "vote",44  for  "we",  45  for  "you",  46  for  "zoo",47  for 
"silence",  48  for  "the  bye  sentence",49  for  "beback",  50  for  "blows",  51  for  "bruna",52  for  "adams",  53  for 
"sounds  good" : '); 


ifW=l 
clear  ny; 
load  newvoice; 
ny  =  y(2700:2700+8191)'; 
elseif  W=2 
clear  ny; 
load  catar; 

ny  =  ca(1900:1900+8191)'; 
elseif  W=3 
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clear  ny; 
load  encic; 

ny  =  en(1200:1200  +8191)’; 
elseif  W==4 
clear  ny; 
load  issos 

ny  =  is(1900:1900+8191)’; 
elseif  W=5 
clear  ny 
load  assos 

ny  =  as(1900: 1900+8 191)'; 
elseif  W=6 
clear  ny 
load  six 

ny  =  si(l:8192)'; 
elseif  W==7 
clear  ny; 
load  myvoice; 
ny  =  x(9000:9000+32767)’; 
elseif  W=8 
clear  ny; 
load  aka; 

ny  =  (ac+0.1656)'; 
elseif  W=9 
clear  ny; 
load  at; 

ny  =  (at+0.1655)'; 
elseif  W=10 
clear  ny; 
load  azure; 

ny  =  [(az+0.1651)'  zeros(l,6144)  ]; 
elseif  W==ll 
clear  ny; 
load  be; 

ny  =  [(be+0.1654)’  zeros(l,3072)  ]; 
elseif  W=12 
clear  ny; 
load  bird; 

ny  =  [(bi+0.1658)'  zeros(l,7168)  ]; 
elseifW=13 
clear  ny; 
load  boot; 

ny  =  [(bo+0.1652)’]; 
elseif  W=14 
clear  ny; 
load  call; 

iiy  =  [(cal+0.1654)'  zeros(l,6144)  ]; 
elseifW=15 
clear  ny; 
load  day; 

ny  =  [(da+0.1645)'  zeros(l,1024)  ]; 
elseifW==16 
clear  ny; 
load  eka; 
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ny  =  [(ek-K).1653)’]; 
elseifW==17 
clear  ny; 
load  epa; 

ny  =  [(ep+0.1650)'  zeros(l,6144)  ]; 
elseifW==18 
clear  ny; 
load  eve; 

ny  =  [(ev+0.1654)'  zeros(l,4096)  ]; 
elseifW=19 
clear  ny; 
load  fether; 

ny  =  [(fa+0.1648)'  zeros(l,6144)  ]; 
elseifW=20 
clear  ny; 
load  foot; 

ny  =  [(foo+0.1653)'  zeros(l,6144)  ]; 
elseifW=21 
clear  ny; 
load  for; 

ny  =  [(fo+0.1649)'  zeros(l,6144)  ]; 
elseifW=22 
clear  ny; 
load  go; 

ny  =  [(go+0.1651)']; 
elseifW=23 
clear  ny; 
load  hate; 

ny  =  [(ha+0.1657)'  zeros(l,7168)  ]; 
elseifW==24 
clear  ny; 
load  he 

ny  =  [(he+0.1657)’  zeros(l,2048)  ]; 
elseifW=25 
clear  ny; 
load  ika 

ny  =  [(ik+0.1654)'  zeros(l,6144)  ]; 
elseifW=26 
clear  ny; 
load  it 

ny  =  [(it+0.1657)'  zeros(l,3072)  ]; 
elseifW=27 
clear  ny; 
load  key; 

ny  =  [(ke  +  0.1652)'  zeros(l,2048)]; 
elseifW=28 
clear  ny; 
load  let; 

ny  =  [0e  + 0.1657)']; 
elseifW=30 
clear  ny; 
load  met; 

ny  =  [(met  +  0.1653)']; 
elseifW=31 


clear  ny; 
load  no; 

ny  =  [(no  +  0.1646)’  zeros(l,1024)]; 
elseifW==34 
clear  ny; 
load  pay; 

ny  =  [(pa  +  0.1655)’  zeros(l,1024)]; 
elseifW=36 
load  see; 

ny  =  [(se+0.1653)’  zeros(l,1024)]; 
elseifW=37 
load  she; 

ny=[(sh-K).1654)’]; 
elseifW==38 
load  then; 

ny  =  [(di+0.1656)’  zeros(l,6144)]; 
elseifW=39 
load  thin; 

ny  =  [(thi  +  0.1655)’  zeros(l,1024)]; 
elseifW=40 
load  to; 

ny  =  [(to  +  0.1649)’  zeros(l,3072)]; 
elseifW==41 
load  up; 

ny  =  [(up  +  0.1653)’  zeros( 1,2048)]; 
elseifW===43 
load  vote; 

ny  =  [(vo  +  0.1654)’  zeros(l,1024)]; 
elseif  W=44 
clear  ny; 
load  we 

ny  =  [(we+0.1655)’  zeros(l,2048)  ]; 
elseif  W=45 
clear  ny; 
load  you 

ny  =  [(you  +  0.1655)’  zeros(l,2048)  ]; 
elseif  W=46 
clear  ny; 
load  zoo 

ny  =  [(zo+0.1646)’  zeros(U048)  ]; 
elseif  W  =47 
clear  ny; 
load  myvoice; 
ny  =  x(l:8192)’; 
elseif  W  =48 
clear  ny; 
load  bye; 

ny  =  [  bye’  zeros(l,9216)]; 
elseif  W  =49 
clear  ny; 

beback  =  loadwav(’beback.wav’); 
ny  =  [  (beback/max(abs(beback))+0.056)'  zeros(l,4824)]; 
elseif  W  =50 
clear  ny; 
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blows  =  loadwav('bIows.wav'); 
ny  =  [  (blows(l;16384)/max(abs(blows))-H).0034)']; 
eIseifW=51 
clear  ny; 

br  =  loadwav('bruna.wav'); 
ny  =  [  (br/max(abs(br))  +  0.0155)'  zeros(l,7268)  ]; 
elseifW=52 
clear  ny; 

adam  =  IoadwavCadamsfam.wav'); 
ny  =  [  (adani(l:32768)/max(abs(adam))  +  0.0081)']; 
elseifW=53 
clear  ny; 
load  engI6; 

ny  =  [  (engl6(l:16384)  +  3.019e-4)']; 


end 

end 

n  =  length(ny) 

D  =  input('Enter  the  finest  depth  for  Time  Splitting : '); 

%  Implementing  the  Cosine  Packet  Transform 

cp  =  CPAnalysis(ny,D,'Sine'); 
stree  =  CalcStatTree(cp, 'Entropy'); 

[btree,vtree]  =  BestBasis(stree,D); 

[n,L]  =  size(cp); 

%  Create  Bell 


bellname  =  'Sine'; 
m  =  n/2''D/2; 

[bp,bm]  =  MakeONBell(bellname,m); 

X  =  zeros(l,n); 

%  initialize  tree  traversal  stack 

stack  =  zeros(2,2''I>l-l); 
tp  =  zeros(l,n); 

V  =  zeros(l,n); 
compr  =  zeros(l,n); 
coef  =  zeros(l,n); 
ncoef  =  zeros(l,n); 

k=l; 

stack(:,k)  =  [0  0  ]'; 

V  =  zeros(l:n); 
vs  =  zeros(l:n); 
ind  =  0; 

le  =  zeros(l,2''D); 
while(k  >  0), 

d  =  stack(l,k);b  =  stack(2,k);  k=k-l; 
if(btree(node(d,b))  ~=  0) ,  %  nonterminal  node 
k  =  k+1;  stack(:,k)  =  [(d+1)  (2*b)  ]'; 
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k  =  k+1;  stack(:,k)  =  [(d+1)  (2*b+l)]'; 
else 

c  =  cp(packet(d,b,n),d+l)’; 
coef(l,b/(2^d).*n+l:(b+l)/(2'"d).*n)  =  c; 
i  =  (b/(2^d)*n+l); 
len  =  length(c); 

[  I,ND]  =  max(abs(c)); 
compr(l,b/(2^d)*n+l)  =  length(c); 

%  Identifying  the  Frequency  Content  of  each  interval 

if  ND  <=  round(len/16) 

v(i)  =  0.25;  %  it  wasO.2 

elseif  ND<=  round(len/8) 

v(i)  =  0.5;  %  it  was  0.4 


elseif  ND  <  length(c)/(2*P) 

v(i)  =  1;  %  it  was  0.6 
else 

[sI.sND]  =  max(abs([coef(i:i+ND-3),0,0,0,coef(i+ND+l:i+len-l)])); 
if  sND  >=  length(c)/(2*P) 
if  ND  <=  leny2 
v(i)=1.5;  %itwas0.75; 
elseif  ND  <=  len*3/4 
v(i)  =  2;  %  it  was  0.9 
else 

v(i)  =  2.5;  %  it  was  1 .0 
end 

elseif  sND  >  round(len/8) 
v(i)  =  1 ;  %  it  was  0.6 

elseif  sND  >  round(len/16) 
v(i)  =  0.5;  %  it  was  0.4 
else 

v(i)  =  0.25;  %  it  was  0.2 
end 

end 

ec(i:i+len-l)  =  ones(l,len)  .*  sum(c.''2);  %  computing  the  energy  of  the  coefficients 
es(i:i+len-l)  =  ones(l,len)  .*  sum(ny(i:i+len-l).'^2);  %  computing  the  energy  of  the  intervals 
van  =  std(c); 

tp  (l,b/(2.M).*n+l)  =  1; 
len  =  length(c); 
ind  =  ind+l; 
le(ind)  =  log2(len); 
v(i); 

rko  =  length(c)/16; 
ko  =  ND; 
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fo  =  4000/lengfli(c)*ND; 
toten  =  sum(coef.^2); 
i  =  (b/(2'M)*n+l); 

%  Applying  the  Adaptive  Thresholding  Compression  Scheme 
if  v(i)  <=  0.5 

if  sum(coefi[i:compr(i)-l+i).'^2)  <toten/n  *  len 
iflen<2*n/(2^I>) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i;compr(i>l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
end 
else 

iflen<2*n/(2'^D) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),98.7); 
else 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)), 97.66); 
end 
end 

nc  =  nncoef(i:compr(i)-l+i); 
end 

ifv(i)>0.5 

sumco  =  sum(coef(i:compr(i)-l+i).''2); 
thres  =  0.5*toten/n  *  len; 
if  sum(coef(i:compr(i)-l+i).''2)  <  toten/n  *  len; 
iflen<2*n/(2'^D) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i;len+i-l)  =  comp(coef(i;len+i-l),99.5); 
end 
else 

if  len  <  2*n/(2^D) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp(coef(i:compr(i)-l+i),99.5); 
end 
end 


nc  =  nncoef(i:compr(i)-l+i); 
end 


if  v(i)  >  1 
vs(i)=l; 
else 

if  es(i:i+len-l)  >  (toten/n*2.5*len) 
vs(i)  =  0.5; 
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end 

end 

y  =  dct_iv(nc);  %  Inverse  Transforming  each  interval 

%  Unfolding  each  interval  and  Reconstructing  the  time  sequence  after  compression 

[xc,xl,xr]  =  unfold(y,bp,bm); 
x(packet(d,b,n))  =  x(packet(d,b,n))  +  xc; 
ifb>0, 

x(packet(d,b-l,n))  ==  x(packet(d,b-l,n))  +  xl; 
else 

x(packet(d,0,n))  =  xQ)acket(d,0,n))  +  edgeunfold('left',xc,bp,bm); 
end 

ifb<2^d-l, 

x(packet(d,b+l,n))  =  x(packet(d,b+l,n))  +  xr; 
else 

x(packet(d,b,n))  =  x(packet(d,b,n))  +  edgeunfold('righf  ,xc,bp,bm); 
end 
end 

end 

nind  =  smn(le>0); 
nle  =  le(l:nmd); 
figm-e(l),plot(ny) ,  hold; 
plot(tp,':’), hold  off; 
figure(2),plot(ny),hold 
plot(v,':'), hold  off; 

mse  =  mean((ny  -  x).^2)  %  computing  the  mean  square  error  between  the  original  and 
%  the  reconstructed  sequence; 

scoefinO  =  sum(abs(coef)>0)  %  computing  the  number  of  non-zero  coefficients  before 
%  compression 

sncoefinO  =  sum(abs(imcoef)>0)  %  computing  the  number  of  non-zero  coefficients  after 
%  compression 


figure(3), 

plot(x); 

figure(4), 

plot(ec); 

figure(5), 

plot(es); 

figure(6),specgram(ny,[],l) 

titleCObserving  the  Coarticulation  for  the  sound  "ISSOS"') 

print  figure6-depsc 

figure(7), 

subplot(3, 1 ,  l),plot(ny) 
titleCSpeech  Signal:  "ISSOS'") 
subplot(3,l,2),plot(ny) ,  hold; 
plot(tp,':'),hold  off;titleCTime  Partition') 
subplot(3 , 1 ,3),plot(ny),hold 
plot(v,':'),hold  off;title('Frequency  Behavior') 
figure(8), 

subplot(3,l,  l),plot(ny,'b'), 

%plot(v,':'), hold  off; 
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titleC  "Be  nice  to  your  sister"') 
subplot(3 , 1 ,2),plot(vs,’b') 
titleCVoiced-Unvoiced  Segmentation') 
subplot(3,l,3), 
specgram(ny,[],l) 

title('Observing  The  Spectogram  for  "Be  nice  to  your  sister'") 
print  figure?  -depsc 


%  Necompcp.m 

%  Input  and  loading  of  speech  to  be  used 
clear; 

V  =  input('Please  enter  "1"  for  female  voice  and  "2"  for  a  male  voice :  ')• 
ifV=I 
P  =  2; 

FV  =  input('PIease  enter  1  for  the  sentence,  2  for  "be" ,  3  for  "hate",  4  for"hey" ,  5  for  "met"  6  for 
"pay",  7  for  "cats",  8  for  "benice" : '); 
ifFV=l 
clear  ny; 
load  fse; 

ny  =  [fse'  zeros(l,5 120)]; 
elseifFV=2 
clear  ny; 
load  fbe; 

ny  =  [fbe'  zeros(l,2048)]; 
elseifFV=3 
clear  ny; 
load  fha; 

ny  =  [fha'  zeros(l,7168)]; 
elseifFV==4 
clear  ny; 
load  fhey; 
ny  =  [fliey']; 
elseifFV==5 
clear  ny 
load  finet; 

ny  =  [finef  zeros(l,1024)]; 
elseifFV=6 
clear  ny 
load  ^ay 

ny  =  [^ay'  zeros(l,1024)]; 
elseifFV=7 
clear  ny 
load  feats 
ny  =  [  feats']; 
elseifFV=8 
clear  ny 

benice  =  loadwav('benice.wav'); 
ny  =  [(benice(l;16384)/max(abs(benice))+0.01 19)']; 
end 
end 

ifV=2 
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P  =  2; 

W  =  inputCPlease  enter  l,for  "project";2  for  "cataratas",3  for  "encyclopedia",  4  for  "issos",5  for  "assos"  6 
for  "six", 7  for  "the  sentence",8  for  "aka",9  for  "at",10  for  "azure",ll  for  "be",12  for  "bird",13  for  "boot"  14 
for  "caH",15  for  "day",16  for  "eka",17  for  "epa",  18  for  "eve",19  for  "father",20  for  "foot",’21  for  "for"  22 
for  "go",  23  for  "hate",  24  for  "he",25  for  "ika",26  for  "it", 27  for  "key", 28  for  "let",29  for  "me",30  for  ’ 
"met",31  for  "no",32  for  "obey",33  for  "opa",34  for  "pay",35  for  "read",36  for  "see",37  for  "she",38  for 
then",39  for  "thin" ,  40  for  "to",41  for  "up",  43  for  "vote",44  for  "we",  45  for  "you",  46  for  "zoo"  47  for 
"silence",  48  for  "the  bye  sentence",49  for  "beback",  50  for  "blows",  51  for  "bruna",52  for  "adams"  : '); 


ifW=l 
clear  ny; 
load  newvoice; 
ny  =  y(2700;2700+819iy; 
eIseifW=2 
clear  ny; 
load  catar; 

ny  =  ca(1900:1900+8191)’; 
elseif  W~3 
clear  ny; 
load  encic; 

ny  =  en(1200:1200  +8191)’; 
elseif  W=4 
clear  ny; 
load  issos 

ny  =  is(1900:1900+8191)'; 
elseif  W— 5 
clear  ny 
load  assos 

ny  =  as(1900:1900+8191)’; 
elseif  W=6 
clear  ny 
load  six 

ny  =  si(l:8192)’; 
elseifW=7 
clear  ny; 
load  myvoice; 
ny  =  x(9000:9000+32767)’; 
elseifW=8 
clear  ny; 

load  aka; 

ny  =  (ac+0.1656)’; 
elseif  W=9 
clear  ny; 
load  at; 

ny  =  (at+0.1655)’; 
elseif  W— 10 
clear  ny; 
load  azure; 

ny  =  [(32+0.1651)’  zeros(l,6144)  ]; 
elseif  W=ll 
clear  ny; 
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load  be; 

ny  =  [(be+0.1654)'  zeros(l,3072)  ]; 
elseifW=12 
clear  ny; 
load  bird; 

ny  =  [(bi+0.1658)'  zeros(l,7168)  ]; 
elseifW=13 
clear  ny; 
load  boot; 

ny  =  [(bo+0.1652)']; 
elseifW=14 
clear  ny; 
load  call; 

ny  =  [(cal+0.1654)'  zeros(l,6144)  ]; 
elseifW=15 
clear  ny; 
load  day; 

ny  =  [(da+0.1645)'  zeros(l,1024)  ]; 
elseifW=16 
clear  ny; 
load  eka; 

ny  =  [(ek+0.1653)']; 
elseifW==17 
clear  ny; 
load  epa; 

iiy  =  [(ep+0.1650)'  zeros(l,6144)  ]; 
elseifW==18 
clear  ny; 
load  eve; 

ny  =  [(ev+0.1654)'  zeros( 1,4096)  ]; 
elseifW=19 
clear  ny; 
load  father; 

ny  [(fa+0.1648)'  zeros(l,6144)  ]; 
elseifW=20 
clear  ny; 
load  foot;  . 

ny  =  [(foo-K).1653)'  zeros(l,6144)  ]; 
elseifW==21 
clear  ny; 
load  for; 

ny  =  [(fo+0.1649)'  zeros(l,6144)  ]; 
elseifW=22 
clear  ny; 
load  go; 

ny  =  [(go+0.1651)']; 
elseifW=23 
clear  ny; 
load  hate; 

ny  =  [(ha+0.1657)'  zeros(l,7168)  ]; 
elseifW==24 


clear  ny; 


load  he 

ny  =  [(he+0.1657)’  zeros(l,2048)  ]; 
elseifW=25 
clear  ny; 
load  ika 

ny  =  [(ik+0.1654)'  zeros(l,6144)  ]; 
elseifW=26 
clear  ny; 
load  it 

ny  =  [(it+0.1657)'  zeros(l,3072)  ]; 
elseifW=27 
clear  ny; 
load  key; 

ny  =  [(ke  +  0.1652)'  zeros(l,2048)]; 
elseifW=28 
clear  ny; 
load  let; 

ny  =  [(le  + 0.1657)']; 
elseifW=30 
clear  ny; 
load  met; 

ny  =  [(met +  0.1653)']; 
elseifW=31 
clear  ny; 
load  no; 

ny  =  [(no  +  0.1646)'  zeros(l,1024)]; 
elseif  W~34 
clear  ny; 
load  pay; 

ny  =  [(pa  +  0.1655)'  zeros(l,1024)]; 

elseifW=36 
load  see; 

ny  =  [(se+0.1653)'  zeros(l,1024)]; 
elseif  W=37 
load  she; 

ny  =  [(sh+0.1654)']; 
elseifW=38 
load  then; 

ny  =  [(th+0.1656)'  zeros(l,6144)]; 
elseif  W=39 
load  thin; 

ny  =  [(thi  +  0.1655)'  zeros(l,1024)]; 
elseif  W=40 
load  to; 

ny  =  [(to  +  0.1649)'  zeros(l,3072)]; 
elseif  W==41 
load  up; 

ny  =  [(up  +  0.1653)'  zeros(l,2048)]; 
elseif  W==43 
load  vote; 

ny  =  [(vo  +  0.1654)'  zeros(l,1024)]; 
elseif  W=44 


clear  ny; 
load  we 

ny  =  [(we+0.1655)'  zeros(l,2048)  ]; 
elseifW==45 
clear  ny; 
load  you 

ny  =  [(you  +  0.1655)'  zeros(l,2048)  ]; 
elseifW==46 
clear  ny; 
load  zoo 

ny  =  [(zo+0.1646)’  zeros(l,2048)  ]; 
elseifW=47 
clear  ny; 
load  myvoice; 
ny  =  x(l:8192)'; 
elseifW==48 
clear  ny; 
load  bye; 

ny  =  [  bye'  zeros(l,9216)]; 
elseif  W  =49 
clear  ny; 

beback  =  loadwav('beback.wav'); 
ny  =  [  (beback/max(abs(beback))-K).056)'  zeros(l,4824)]; 
elseif  W  =50 
clear  ny; 

blows  =  loadwav('blows.wav'); 
ny  =  [  (blows(l:16384)/max(abs(blows))-K).0034)']; 
elseif  W  =51 
clear  ny; 

br  =  loadwavCbruna.wav'); 
ny  =  [  (br/max(abs(br))  +  0.0155)'  zeros(l,7268)  ]; 
elseif  W  =52 
clear  ny; 

adam  =  loadwav('adamsfam.wav'); 

uy  =  [  (adam(l:32768)/max(abs(adain))  +  0.0081)']; 


end 

end 

n  =  length(ny) 

D  =  inputCEnter  the  finest  depth  for  Time  Splitting  : '); 

%  Implementing  the  Cosine  Packet  transform 

cp  =  CPAnalysis(ny,D,'Sine’); 
stree  =  CalcStatTree(cp, 'Entropy'); 

[btree,vtree]  =  BestBasis(stree,D);  %  Choosing  the  basis  by  applying  the  Best  Basis  Algorithm 
[n,L]  =  size(cp); 

%  Create  Bell 

bellname  =  'Sine'; 
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m  =  n/2''D/2; 

[bp,bm]  =  MakeONBelI(bellname,m); 

X  =  zeros(l,n); 

%  initialize  tree  traversal  stack 
stack  =  zeros(2,2^D+l); 
tp  =  zeros(l,n); 

V  =  zeros(l,n); 
compr  =  zeros(l,n); 
coef  =  zeros(l,n); 
ncoef  =  zeros(l4i); 

k=l; 

stack(:,k)  =  [0  0  ]'; 

V  =  zeros(l  :n); 
ind  =  0; 

le  =  zeros(l,2'^D); 
while^  >  0), 

d  =  stack(l^);  b  =  stack(2,k);  k=k-l ; 
if(btree(node(d,b))  ~=  0) ,  %  nonterminal  node 
k  =  k+1;  stack(:,k)  =  [(d+1)  (2*b)  ]'; 
k  =  k+1;  stackC,k)  =  [(d+1)  (2*b+l)]'; 
else 

c  =  cp(packet(d,b^),d+l)'; 
coef(l,b/(2M).*n+l:(b+l)/(2M).*n)  =  c; 
i  =  (b/(2M)*n+l); 
len  =  length(c); 

[  I,ND]  =  max(abs(c)); 
compr(l,b/(2^)*n+l)  =  length(c); 

%  Identifying  the  Frequency  content 

if  ND  <=  roimd(len/16) 

v(i)  =0.4; 

elseif  ND<=  round(len/8) 
v(i)  =  0.6; 

elseif  ND  <=  round(len/(16/3)) 
v(i)  =  0.7; 

elseif  ND  <=  length(c)/(2*P) 
v(i)  =  0.8; 
else 

[sI,sND]  =max(abs([coef(i:i+ND-3), 0,0,0, coef(i+ND+l  :i+len- 1)])); 
if  sND  >=  len/(2*P) 
v(i)  =  1; 

elseif  sND  >  round(len/(16/3)) 
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v(i)  =  0.8; 

elseif  sND  >  round(leii/8) 
v(i)  =  0.7; 

elseif  sND  >  round(len/16) 
v(i)  =  0.6; 
else 

v(i)  =  0.4; 
end 

end 


ec  =  sum(c.^2);  %  Computing  the  Energy  of  the  Coefficients 
vari  =  std(c); 

^(l,b/(2.''d).*n+l)=l; 
len  =  length(c); 
ind  =  ind+l; 
le(ind)  =  log2(len); 


v(i); 

rko  =  length(c)/16; 
ko  =  ND; 

fo  =  4000/Iength(c)*ND; 


toten  =  sum(coef.''2); 
i  =  (b/(2"^)*n+l); 

%  Applying  the  Adaptive  Thresholding  Compression  Scheme 
if  v(i)  <=  0.6 

if  sum(coef(i;compr(i)-l+i).^2)  <toten/n  *  len 
iflen<2*n/(2''D) 

nncoef(i:compr(i>l+i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-H-i)),99.5); 
end 
else 

iflen<2*n/(2^D) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),98.7);%98.7 
end 
end 


nc  =  nncoef(i:compr(i)-l+i); 
end 


ifv(i)>0.6 

sumco  =  sum(coef(i:compr(i)-l+i).'^2); 
thres  =  0.5*toten/n  *  len; 
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if  sum(coef(i:compr(i)-l+i).^2)  <  toten/n  *  len;  %  it  was  0.5*toten/n*len 
iflen<2*n/(2^D) 

nncoef(i:compr(i)-l+i)  =  comp((coef(i:compr(i)-l+i)),99.5);%0.91%  for  all  in  "/be  nice/" 
else 

nncoef(i:len+i-l)  =  comp(coef(i.ien+i-l),99.5); 
end 
else 

iflen<2*n/(2^D) 

nncoef(i:compr(i)-H-i)  =  comp((coef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp(coef(i:compr(i)-l+i),99.5); 
end 
end 


nc=  nncoef(i:compr(i)-l+i); 
end 


y  =  dct_iv(nc);  %  Inverse  transforming  each  interval 

%  Unfolding  each  interval  and  Reconstructing  the  time  sequence  after  compression 

[xc,xl,xr]  =  unfold(y,bp,bm); 
x(packet(d,b,n))  =  x(packet(d,b,n))  +  xc; 
ifb>0, 

x(packet(d,b-l,n))  =  x(packet(d,b-l,n))  +  xl; 

else 

x(packet(d,0,n))  =  x(packet(d,0,n))  +  edgeunfoldCleft',xc,bp,bm); 
end 

ifb<2^d-l, 

x(packet(d,b+l,n))  =  x(packet(d,b+l,n))  +  xr; 
else 

x(packet(d,b,n))  =  x(packet(d,b,n))  +  edgeunfoldCright',xc,bp,bm); 
end 
end 

end 

nind  =  sum(le>0); 
nle  =  le(l:nind); 
figure(l),plot(ny) ,  hold; 
plot(tp,':’),holdofif; 
print  figure(l)_deps 
figure(2),plot(ny),hold 
plot(v,':'),hold  off; 
print  -deps  figure2 

mse  =  mean((ny  -  x).^2)  %  computing  the  mean  square  error  between  the  original  and 
%  the  reconstructed  sequence; 

scoefinO  =  sum(abs(coef)>0)  %  computing  the  number  of  non-zero  coefficients  before 
%  compression 

sncoefinO  =  sum(abs(nncoeO>0)  %  computing  the  number  of  non-zero  coefficients  after 
%  compression 
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figure(3), 

plot(x); 

figure(4) 

subplot(2,2,  l),plot(ny,'b') 
title('a)"ISSOS",  ORIGINAL  PLOT) 
subplot(2,2,2),plot(x,’b') 

titleCc)AFTER  FIXED  THRESHOLDING(0.78%)') 
subplot(2,2,3),specgram(ny,[],  1) 
titleCb)  SPECTOGRAM-) 
subplot(2,2,4),specgrain(x,[],  1) 
titleCd)  SPECTOGRAM  (AFTER)') 


%  Name:  encp6.m  and  ndencompm 

%  Subject:  Analysis,  Denoising/Compression  and  Synthesis  of  Speech  data; 

%  Description:  These  two  routines  contain  the  Denoising  scheme  applied  prior  to 
%  the  compression  schemes; 

%  The  differences  between  the  two  routines  are  in: 

%  a)  The  Frequency  Identification  implementation;  for  example  ndencomp,m 
%  makes  more  use  of  the  second  largest  coefficient  than  encp6.m  does; 

%  b)  The  segmentation  between  voiced  and  unvoiced  segments:  encp6.m  uses 
%  500  Hz  for  female  speech  and  1,000  Hz  for  male  speech;  ndencomp.m 

%  uses  l,000Hz  for  any  gender; 

% 

%  c)  The  detection  of  the  presence  of  low  energy  speech  in  high 
%  energy  noisy  background.  ndencomp.m  implements  such  a  scheme,  while  encp6.m 
%  doesn’t; 

% 

%  d)  The  Adaptive  Thresholding  Compression  Scheme 
%  Written  and  adapted  by  J.  Roberto  V.  Martins,  October  1995; 


%  Encp6.m 
clear; 

%  Input  and  loading  of  speech  data 


V  =  input('Please  enter  "1"  for  female  voice  and  "2"  for  a  male  voice  : '); 
ifV=l 
P  =  8; 

FV  =  input('Please  enter  1  for  the  sentence,  2  for  "be" ,  3  for  "hate",  4  for  "hey"  ,  5  for  "met"  ,  6  for 
"pay",  7  for  "cats",  8  for  "benice" : '); 
ifFV  =  l 
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clear  ny; 
load  fse; 

ny  =  [fse'  zeros(l,5 120)]; 

elseifFV=2 
clear  ny; 
load  fbe; 

ny  =  [fbe'  zeros(1^048)]; 
elseifFV=3 
clear  ny; 
load  fha; 

ny  =  [fha'  zeros(l,7168)]; 
elseifFV==4 

clear  ny;  ,  ? 

load  fhey; 
ny  =  [fhey']; 
elseifFV==5 
clear  ny 
load  finet; 

ny  =  [finef  zeros(l,1024)]; 
elseifFV=6 
clear  ny 
load  fjpay 

ny  =  [fyay'  zeros(l,1024)]; 

elseifFV=7 
clear  ny 
load  feats 
ny  =  [  feats']; 
elseifFV==8 
clear  ny 

benice  =  loadwav('benice.wav'); 

=  [((benice(l:16384)/max(benice))  +  0.01 19)']; 
end 
end 

ifV=2 
P  =  4; 

W=  inputCPlease  enter  l,for  "project",2  for  "cataratas",3  for  "encyclopedia",  4  for  "issos",5  for  "assos",6 
for  "six",7  for  "the  sentence",8  for  "aka",9  for  "at",10  for  "azure",ll  for  "be", 12  for  "bird",13  for  "boot",14 
for  "caH",15  for  "day",16  for  "eka",17  for  "epa",  18  for  "eve",19  for  "father",20  for  "foot",  21  for  "for",  22 
for  "go",  23  for  "hate",  24  for  "he",25  for  "ika",26  for  "it",27  for  "key",28  for  "let",29  for"nie",30  for 
"met",31  for  "no",32  for  "obey",33  for  "opa",34  for  "pay",35  for  "read",36  for"see",37  for  "she",38  for 
"then",39  for  "thin" ,  40  for  "to",41  for  "up",  43  for  "vote",44  for"we",  45  for  "you",  46  for  "zoo",47  for 
"silence",  48  for  "the  bye  sentence",49  for  "beback",  50  for  "blows",  51  for  "bruna",  53  for  "sounds  good"  • 
’); 


ifW=l 
clear  ny; 
load  newvoice; 
ny-y(2700:2700+8191)‘; 
elseifW=2 
clear  ny; 
load  catar; 


113 


ny  =  ca(1900:1900+8191)'; 
elseifW=3 
clear  ny; 
load  encic; 

ny  =  en(1200:1200+8191)'; 
elseifW==4 
clear  ny; 
load  issos 

ny  =  is(1900:1900+8191)'; 
elseifW=5 
clear  ny 
load  assos 

ny  =  as(1900:1900+8191)'; 
elseifW=6 
clear  ny 
load  six 

ny  =  si(l:8192)’; 
elseifW=7 
clear  ny; 
load  myvoice; 
ny  =  x(9000:9000+32767)'; 
elseifW=8 
clear  ny; 
load  aka; 

ny  =  (ac+0.1656)’; 
elseifW=9 
clear  ny; 
load  at; 

ny  =  (at+0.1655)'; 

elseifW=10 
clear  ny; 
load  azure; 

ny  =  [(az+0.1651)'  zeros(l,6144)  ]; 
elseifW=ll 
clear  ny; 
load  be; 

ny  =  [(be+0.1654)'  zeros(l,3072)  ]; 
elseifW=12 
clear  ny; 
load  bird; 

ny  =  [(bi+0.1658)’  zeros(l,7168)  ]; 
elseifW=13 
clear  ny; 
load  boot; 

ny  =  [(bo+0.1652)']; 
elseifW=14 
clear  ny; 
load  call; 

ny  =  [(cal+0.1654)'  zeros(l,6144)  ]; 
elseifW=15 
clear  ny; 
load  day; 

ny  =  [(da+0.1645)'  zeros(l,1024)  ]; 


elseifW==16 
clear  ny; 
load  eka; 

ny  =  [(ek+0.1653)']; 
elseif  W=17 
clear  ny; 
load  epa; 

ny  =  [(ep+0.1650)'  zeros(l,6144)  ]; 
elseif  W=1 8 
clear  ny; 
load  eve; 

ny  =  [(ev+0.1654)'  zeros(l,4096)  ]; 
elseif  W=1 9 
clear  ny; 
load  father; 

ny  =  [(fa+0.1648)'  zeros(l,6144)  ]; 
elseif  W=20 
clear  ny; 
load  foot; 

ny  =  [(foo+0.1653)’  zeros(l,6144)  ]; 
elseif  W=21 
clear  ny; 
load  for; 

ny  =  [(fo+0.1649)'  zeros(l,6144)  ]; 

elseif  W— 22 
clear  ny; 
load  go; 

ny  =  [(go+0.1651)*]; 
elseif  W==23 
clear  ny; 
load  hate; 

ny  =  [(ha+0.1657)'  zeros(l,7168)  ]; 
elseif  W=24 
clear  ny; 
load  he 

ny  =  [(he+0.1657)'  zeros(l,2048)  ]; 
elseif  W=25 
clear  ny; 
load  ika 

ny  =  [(ik+0.1654)'  zeros(l,6144)  ]; 
elseif  W==26 
clear  ny; 
load  it 

ny  =  [(it+0.1657)'  zeros(l,3072)  ]; 
elseif  W==27 
clear  ny; 
load  key; 

ny  =  [(ke  +  0.1652)'  zeros(l,2048)]; 
elseif  W=28 
clear  ny; 
load  let; 

ny  =  [(le  + 0.1657)’]; 
elseif  W==30 


clear  ny; 
load  met; 

ny  =  [(met +  0.1653)’]; 
elseifW=31 
clear  ny; 
load  no; 

ny  =  [(no  +  0.1646)'  zeros(l,1024)]; 
elseifW=34 
clear  ny; 
load  pay; 

ny  =  [(pa  +  0.1655)'  zeros(l,1024)]; 
elseifW=36 
load  see; 

ny  =  [(se+0.1653)'  zeros(l,1024)]; 
elseifW=37 
load  she; 

ny  =  [(sh+0.1654)']; 
elseifW=38 
load  then; 

ny  =  [(th+0.1656)'  zeros(l,6144)]; 
elseifW=39 
load  thin; 

ny  =  [(thi  +  0.1655)'  zeros(l,1024)]; 
elseifW==40 
load  to; 

ny  =  [(to  +  0.1649)'  zeros( 1,3072)]; 
elseifW==41 
load  up; 

ny  =  [(up  +  0.1653)'  zeros( 1,2048)]; 
elseifW==43 
load  vote; 

ny  =  [(vo  +  0.1654)'  zeros(l,1024)]; 
elseifW==44 
clear  ny; 
load  we 

ny  =  [(we+0.1655)'  zeros(l,2048)  ]; 
elseifW=45 
clear  ny; 
load  you 

ny  =  [(you  +  0.1655)'  zeros(l,2048)  ]; 
elseifW=46 
clear  ny; 
load  zoo 

ny  =  [(zo+0.1646)'  zeros(l,2048)  ]; 
elseifW==47 
clear  ny; 
load  myvoice; 
ny  =  x(l:8192)'; 
elseif  W  =48 
clear  ny; 
load  bye; 

ny  =  [  bye'  zeros(l,9216)]; 
elseif  W  =49 
clear  ny; 


load  beback; 

ny  =  [  (beback-127.4452)'  zeros(l,4824)]; 
elseifW==50 
clear  ny; 

blows  =  loadwav  ('blows.wav'); 
ny  =  [  (blows(l:16384)+0.3027)']; 
elseifW=51 
clear  ny; 

br  =  loadwavCbruna-wav'); 
ny  =  [  (br/max(absCbr))+0.0155)'  zeros(l,7268)]; 
elseifW==52 
clear  ny; 

br  =  loadwav('adamsfam.wav'); 
ny  =  [  (adam(l:32768)/max(abs(adam))+0.0081)']; 
elseifW=53 
clear  ny; 
load  engl6; 

ny  =  [  (engl6(l:16384)  +  3.019e-4)']; 
elseifW=54 
clear  ny; 
load  voiq; 

ny  =  [voiq(l:32768)']; 
end 
end 


%  Implementing  the  Cosine  Packet  Transform 
n  =  length(ny) 

D  =  input('Enter  the  finest  depth  for  Time  Splitting  : '); 
cp  =  CPAnalysis(hy,D,'Sine'); 
stree  =  CalcStatTree(cp,'Entropy'); 

[btree.vtree]  =  BestBasis(stree,D); 

[n,L]  =  size(cp); 


%  Create  Bell 


bellname  =  'Sine'; 
m  =  n/2^D/2; 

[bp,bm]  =  MakeONBell(bellname,m); 

% 

X  =  zeros(l,n); 

% 

%  initialize  tree  traversal  stack 
% 

stack  =  zeros(2,2^D+l); 
tp  =  zeros(l,n); 

V  =  zeros(l,n); 
compr  =  zeros(l,n); 
coef  =  zeros(l,n); 
ncoef  =  zeros(l,n); 
k  =  l; 
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stack(:,k)  =  [0  0]'; 
V  =  zeros(l:n); 
ind  =  0; 

le  =  zeros(l,2''D); 
while(k  >  0), 


d  =  stack(l,k);  b  =  stack(2,k);  k=k-l; 
if(btree(node(d,b))  0) ,  %  nonterminal  node 
k  =  k+1;  stack(:,k)  =  [(d+1)  (2*b) 
k  =  k+1;  stack(:,k)  =  [(d+1)  (2*b+l)]’; 
else 

c  =  cp(packet(d,b,n),d+l)’; 

coef(l,b/(2M).*n+l:(b+l)/(2M).*n)  =  c;  ? 

i  =  (b/(2M)*n+l); 
len  =  lengfli(c); 

[  I,ND]  =  max(abs(c)); 
compr(l,b/(2''d)*n+l)  =  length(c); 

%  Identifying  the  Frequency  Content  of  each  interval 

if  ND  <=  roimd(len/16) 

[sI,sND]  =  max(abs([coef(i:i+ND-2),0,coef(i+ND;i+len-l)])); 

if  (4000/len*ND)  >  125  %ND  <=  round(len/32) 

if  (4000/len*sND)  <  400 

if  (4000/len*sND)  >=  125 
v(i)=l; 

ncoef(i:compr(i)-l+i)  = 

[zeros(l,round(len/64)Xcoef(i+round(len/64):i+round(len/5)-l),zeros(l,len-round(len/5))  ];  %  Denoising 
else 

if(4000/len*sND)>=60 
ncoef(i:compr(i)-l+i)  = 

[zeros(l,round(len/64)),coef(i+round(len/64):i+round(len/16)-l),zeros(l,len-round(len/16))];%Denoising 
v(i)  =  l; 
else 

if  (4000/len*sND)  <=30 
ncoef(i:len-l+i)  =  zeros(l,len);  %  Denoising 
v(i)  =  0; 
else 

ncoef(i:len-l+i)  =  [zeros(l,round(len/64)),  coef(i+round(len/64): 
i+round(len/16)-l),zeros(l,len-round(len/16))  ];  %  zeros(l,len); 

v(i)=l; 

end 

end 

end 

elseif  sND  <  len/4 
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ncoef(i:compr(i)-l+i)=[zeros(l,len/64),coef(i+round(len/64):i+len-l)];%Denoise 

v(i)=l; 

else 

ncoef(i:compr(i)-l+i)  =  [  zeros(l,lenyi6),coef(i+len/16:i+len-l)  ];  %  Denoising 
v(i)  =  2; 
end 
end 

if  (4000/len*ND)  <=  125 
if  sND  <=  len/16 

ncoef(i:len-l+i)  =  zeros(l,len);  %  Denoising 
v(i)  =  0; 

elseif  sND  <=  len/4 
if  (4000/len*sND)  >=  300 

ncoef(i:i+len-l)  =  [zeros(l,sND-l),coef(i+sND-l),zeros(l,len- 
sND)];%[zeros(l, len/16), coef(i+len/16:i+len-l)  ]; 
v(i)  =  l; 
else 

ncoef(i:i+len-l)  =  zeros(l,len);  %  Denoising 
v(i)  =  0; 
end 
else 

if  (4000/len*ND)  <  64 

ncoef(i:conipr(i)-l+i)  =  zeros(l,len);  %  Denoising 
v{i)  =0; 
else 

ncoefi[i:compr(i)-l+i)  =  [  zeros(l, len/4), coef(i+len/4:i+len-l)  1; 
v(i)  =  2; 
end 
end 
end 

elseif  ND  <  length(c)/P 

[sI,sND]  =  max(abs([coef(i:i+ND-2),0,coef(i+ND:i+len-l)])); 
if  sND  <  len/32 
SND=sND 

ncoef(i:compr(i)  -  1+i)  =  zeros(l,len);  %  Denoising 
v(i)  =  0; 
else 

v(i)  =  1; 

ncoef(i:compr(i)-l+i)  =[  zeros(l,len/16),coef(i+len/16;i+len-l)  ];  %  Denoising 
end 
else 

[sI,sND]  =  max(abs([coef(i:i+ND-3),0,0,0,coef(i+ND+l;i+len-l)])); 
if  sND  >=  length(c)/(2*P) 
v(i)  =  2; 

ncoef(i:compr(i)-l+i)  =  [  zeros(l,len/16),coef(i+len/16:i+len-l)  ];  %  Denoising 
else 

v(i)=l; 

ncoef(i:compr(i>l+i)  =  [  zeros(l,len/16),coef(i+len/16:i+len-l)  ];  %  Denoising 
end 
end 

ec  =  sum(c.'^2); 
van  =  std(c); 
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^(l,b/(2.M).*n+l)=l; 
len  =  length(c); 
md  =  md+l; 
le(ind)  =  log2(len); 

v(i); 

rko  =  length(c)/16; 
ko  =  I'TD; 

fo  =  4000/Iength(c)*ND; 

de(ind)  =d; 
be(md)  =  b; 
toten  =  sum(coef.^2); 
i  =  (b/(2M)*n+l); 
if  v(i)  =  0 

micoef(i:compr(i)-l+i)  =  ncoef(i:compr(i)-l+i); 
nc  =  nncoef(i:compr(i)-l+i); 
end 

%  Applying  the  Adaptive  Thresholding  Compression  Scheme 


ifv(i)=l 


if  sum(ncoef(i:compr(i)-l+i).^2)  <  toten/n  *  len 
iflen<2*n/(2''D) 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
end 
else 

iflen<2*n/(2^D) 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i;compr(i)-l+i)),98.7); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)), 97.66); 
end 
end 


nc  =  imcoef(i:compr(i)-l+i); 
end 


if  v(i)  =  2 

sumco  =  sum(coef(i:compr(i)-l+i).^2); 
thres  =  0.5*toten/n  *  len; 

if  sum(coef(i:compr(i)-l+i).^2)  <  toten/n  *  len;  %  it  was  0.5*toten/n*len 
iflen<2*n/(2'^D) 

imcoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
else 

nncoef(i:len+i-l)  =  comp(ncoef(i:len+i-l),99.5); 
end 
else 

iflen<2*n/(2'"D) 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
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else 

nncoef(i:compr(i)-l+i)  =  comp(ncoef(i:compr(i)-l+i),99.5); 
end 
end 


nc  =  nncoef(i;compr(i)-l+i); 
end 


y  =  dct_iv(nc);  %  Inverse  Transforming  each  interval 

%  Unfolding  and  reconstructing  the  time  sequence  after  compression 

[xc,xl,xr]  =  unfold(y,bp,bm); 
x(packet(d,b,n))  =  x^acket(d,b,n))  +  xc; 
ifb>0, 

x(packet(d,b-l,n))  =  x(packet(d,b-l,n))  +  xl; 
else 

x(packet(d,0,n))  =  x(packet(d,0^))  +  edgeunfold('left',xc,bp,bm); 

end 

if  b  <2^-1, 

x(packet(d,b+l,n))  =  x(packet(d,b+l,n))  +  xr; 

else 

x(packet(d,b,n))  =  x(packet(d,b,n))  +  edgeunfold('right',xc,bp,bm); 
end 
end 

end 

nind  =  sum(le>0); 
nle  =  le(I  mind); 

XX  =  x.*6; 

figure(l),plot(ny) ,  hold; 
plot(q),':'),hold  off; 
figure(2),plot(ny)diold 
plot(v,':'), hold  off; 

mse  =  mean((ny  -  x).^2)  %  Computing  the  mean  sqare  error  between  the 
%  original  and  the  reconstructed  compressed  one 
scoefinO  =  sum(abs(coeO>0) 

%  Computing  the  number  of  non-zero 

%  coefficients  before  denoising/compression 
sncoefinO  =  sum(abs(nncoef)>0)  %  Computing  the  number  of  non-zero 
%  coefficients  after  denoising/compression 

figure(3), 

subplot(2,2,  IXplotCny.V); 

title('MET,  male  speaker'); 

subplot(2,2,2),plot(x,'b'); 

titleCAFTER  DENOISING/COMPRESSION’) 

subplot(2,2,3),specgram(ny); 

title('Original  Spectogram'); 

subplot(2,2,4),specgram(x); 

titleCAFTER  DENOISING/COMPRESSION’) 
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%  Ndencomp.m 

%  Obs.:  This  routine  has  parts  a),  b)  and  c)  identical  to  the  same  parts  of 
%  routine  encp6.m.  Thus  we  are  only  presenting  the  complement,  which  begins 
%  in  part  d). 

compless  =  0; 
endearly  =  0; 

%  Identifying  the  Frequency  Content  of  each  interval 
if  ND  <=  round(len/16);  %it  was  len/64*3 
if  ND  <=  round(len/64) 

[  sI,sND  ]  =  max(abs([coef(i:i+ND-2),0,coef(i+ND;i+len-I )])); 
if  (4000/len*  sND)  <=  300  %  try  to  make  it  better! ! ! 
iflen>n/(2^D)*8 

coef(i+ND-l)=0; 
coef(i+sND- 1  )=0; 

[  tI,tND  ]  =  max(abs(coef(i:i+len-l)));%  (recovering  the  "ts"  sound) 

%  implemented  to  solve  problems  like  in  "cats"  :  it 
%  still  needs  to  be  improved! ! 

if  tND  <  round(len/20) 
v(i)  =  0.1; 
else 

TND  =  tND; 
compless  =  1; 
endearly  =  1; 

ncoef(i:i+len-l)=zeros(l,len);%[zeros(l,len/64),coef(i+len/64;i+len-l)];%  Denoising 

%[zeros(l,tND-l),coef(i+tND-l),zeros(l,len-tND)];%[zeros(l,len/64),coef(i+len/64:i+len- 

l)];%[zeros(l,tND-l),coef(i+tND-l),zeros(l,len-tND)];%[zeros(l,tND-l),coef(i+tND-l:i+len-l)]; 

v(i)  =  0.5; 
end 

else 

v(i)  =  0.1; 
end 

elseif  sND  <=  round(len/8) 

ncoef(i:i+len-I)  =  [  zeros(l,sND-l),coef(i+sND-l),zeros(l,len-sND)  ];%  Denoising 
v(i)  =  0.5; 

elseif  sND  <=  roimd(len/4) 

ncoef(i:i+len-l)  =[  zeros(l,sND-l),coef(i+sND-l),zeros(l,len-sND)  ];%  Denoising 
v(i)=1.0; 
else 

v(i)  =  0.1; 
end 

else 
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[  sI,sND  ]  =  max(abs([coef(i:i+ND-2),0,coef(i+ND:i+len-l)])); 

if  sND  <  len/20  %  see  in  enq56  how  it  was  made  for  125<ND<=250  &  125<=sND<400 
v(i)  =  0.1; 

elseif  sND<=  roxmd(Ien/8) 
compless  =  1;  %  flag  to  indicate  to  compress  less 
ncoef(i:i+len-l)  =  [zeros(l,len/32),  coef(i+len/32:i+len-l)];%  Denoising 
v(i)  =  0.5; 

elseif  sND  <  lengfli(c)/(2*P) 
compless  =  1; 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)] ;  %  Denoising 
v(i)  =  1; 

elseif  sM)<=  round(len/P) 
compless  =  1; 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)] ;  %  Denoising 
v(i)=1.5; 

else 

compless  =  1; 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)];  %  Denoising 
v(i)  =  2; 


end 

end 

elseif  ND  <=  roimd(len/8) 

ncoef  (i:i+len-l)=  [zeros(l,len/32) ,  coef(i+len/32:i+len-l)];%  Denoising 
v(i)  =  0.25;  %  it  wasO.2 
elseif  ND<=  roimd(len/4) 

ncoef(i:i+len-l)  =  [  zeros(l,len/32) ,  coef(i+len/32:i+len-l)] ;%  Denoising 
v(i)  =  0.5;  %  it  was  0.4 

elseif  ND  <  length(c)/(2) 

ncoef(i:i+len-l)  =  [  zeros(l,len/32),  coef(i+len/32:i+len-l)];  %  Denoising 
v(i)  =  1;  %  it  was  0.6 
else 

[sI,sND]  =  max(abs([coef(i:i+ND-3),0,0,0,coef(i+ND+l:i+len-l)])); 
if  sND  >=  length(c)/(2*P) 
if  ND  <=  len/2 

ncoef(i:i+len-l)  =  [  zeros(l,len/32),  coef(i+len/32:i+len-l)] ;%  Denoising 
v(i)=1.5;  %it  was  0.75; 
elseif  ND  <=  round(len*3/4) 

%compless  =  1 ;  %  included  to  help  in  the  voice  quality  sentence 
ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)];%  Denoising 
v(i)  =  2;  %  it  was  0.9 
else 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)];%  Denoising 
v(i)  =  2.5;  %  it  was  1.0 
end 

elseif  sND  >  round(len/8) 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)];%  Denoising 
v(i)=  1;  %  it  was  0.6 

elseif  sND  >=  round(len/l  6) 

ncoef(i:i+len-l)  =  [zeros(l,len/32),coef(i+len/32:i+len-l)];%  Denoising 
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v(i)  =  0.5;  %  it  was  0.4 
else 

ncoef(i:i+len-l)  =[  zeros(l,len/32),coef(i+leii/32:i+len-l)];%  Denoising 
v(i)  =  0.25;  %  it  was  0.2 
end 

end 

EC  =  sum(c.^2); 

%  Computing  the  coefficients  energy 

ec(i:i+len-l)  =  ones(l,len)  .*  sum(c.^2); 
es(i:i+len-l)  =  ones(l,Ien)  .*  sum(ny(i:i+len-l).^2); 

%  Computing  the  energy  of  each  interval 

van  =  std(c); 

%ncoef(l,b/(2'^d).*n+l:(b+l)/(2''d).*n)  =  nc; 

tp(l,b/(2.M).*n+l)=l; 

len  =  length(c); 

ind  =  ind+l; 

le(ind)  =  log2(len); 

rko  =  length(c)/16; 

ko  =  ND; 

fo  =  4000/length(c)*ND 

de(ind)  =d; 

be(ind)  =b; 

toten  =  sum(coef.^2); 

i  =  (b/(2''d)*n+l); 

%  Applying  the  Adaptive  Thresholding  Compression  Scheme 
ifv(i)  =  0.1 

imcoef(i:  i+len- 1 )  =  zeros(i:  i+len- 1 ) ; 
nc=  nncoef(i;compr(i)-l+i); 
elseif  v(i)  <=  0.5 

if  sum(coef(i;compr(i)-l+i).''2)  <  toten/n  *  len 
iflen<2*n/(2^D) 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+l)),99.5); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
end 
else 

iflen<2*n/(2^D) 

nncoefi[i:compr(i>l+i)  =  comp((ncoef(i:compr(i)-H-i)),98.7); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)), 97.66); 
end 
end 


nc=  nncoef(i:compr(i)-l+i); 
end 

if  v(i)  >  0.5  %  it  was  0.4;  %  it  was=  1 
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sumco  =  suni(coef(i:compr(i)-l+i).^2); 

Hires  =  0.5*toten/n  *  len; 

if  sum(coef(i:compr(i)-l+i).^2)  <  toten/n  ♦  len;  %  it  was  0.5*toten/n*len 
iflen<2*n/(2'"D) 
if  compless  =  1 

nncoef(i:compr(i>l+i)  =  comp((ncoef(i:compr(i)-l+i)), 97.66); 
else 

nncoef(i:compr(i>l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
end 
else 

if  compless  =  1 

imcoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),98.7); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
end 

end 

else 

iflen<2*n/(2^D) 
if  compless  =  1 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),98.7); 
else 

nncoef(i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),99.5); 
end 
else 

%if  compless  —  1 

%nncoe^i:compr(i)-l+i)  =  comp((ncoef(i:compr(i)-l+i)),98.7); 

%else 

nncoef(i:compr(i)-l+i)  =  comp(ncoef(i:compr(i)-l+i),99.5); 

%end 

end 

end 


nc=  nncoef(i:compr(i)-l+i); 


end 

if  endearly  =  1 

[ma,md]  =  max(nncoef(i:i+len-l)); 
end 

y  =  dct_iv(nc); 

%  Inverse  Transforming  each  interval 
if  endearly  =  1 

y  =  y.*(abs(imcoef(i:i+len-l))>0); 
end 

%  Unfolding  and  Reconstructing  the  Time  sequence  after  compression 
[xc,xl,xr]  =  unfold(y,bp,bm); 
x(packet(d,b,n))  =  x(packet(d,b,n))  +  xc; 
ifb>0, 

x(packet(d,b-l,n))  =  x(packet(d,b-l,n))  +  xl; 
else 

x(packet(d,0,n))  =  x(packet(d,0,n))  +  edgeunfold('left',xc,bp,bm); 
end 

ifb<2M-l, 
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x(packet(d,b+l,n))  =  x(packet(d,b+l^))  +  xr; 
else 

x(packet(d,b,n))  =  x(packet(d,b,n))  +  edgeunfold('right',xc,bp,bin); 
end 
end 

end 

nind  =  sum(le>0); 
nle  =  le(l:nmd); 
figure(l),plot(ny) ,  hold; 
plot(tp,':'),hold  off; 
figure(2),plot(ny),hold 
plot(v,':'),hoId  off; 
mse  =  mean((ny  -  x).''2) 

%  Computing  the  mean  sqare  error  between  original  signal  and  the  signal  after  compression 

scoefinO  =  sum(abs(coef)>0) 

%  Computing  the  number  of  non-zero-coefficients  before 
%  denoising/compression 
sncoefinO  =  sum(abs(nncoef)>0) 

%  Computing  the  munber  of  non-zero  coefficients  after 
%  denoising/compression 

figure(3), 

plot(x); 

figure(4), 

plot(ec); 

figure(5), 

plot(es); 

first  =1; 

nv  =  zeros(l,length(v)); 
nnv  =  zeros(l,length(v)); 
for  i=l:lengfli(v) 
if  v(i)  >0 
if  v(i)  <=  0.5 
nv(i)=1.0; 
dist  =  i  -  first; 

%ifdist>=512 
if  ec(first)  >  toten/(n*32)*dist 
nnv(first)  =  nv(first); 
miv(i)  =  nv(i); 
end 


first  =  i; 
end 

ifv(i)>0.5 
nnv(i)=  1.5; 
end 
end 
end 

XX  =  x.*4; 

figure(6),plot(ny),hold 
plot(nnv,':'),hold  off; 
figure(7) 

subplot(2,2, 1  ),plot(ny,T3') 
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title("'PAY,  male  speaker'") 

subplot(2,2,2),plot(x,'b’) 

titleCAFTER  DENOISING/COMPRESSION ') 

subplot(2,2,3),specgram(ny,[],l) 

titleCORIGINAL  SPECTOGRAM') 

subplot(2,2,4),specgram(x,  [],  1) 

titleCAFTER  DENOISING/COMPRESSION’) 


%  Name:  encptour.m  and  ndentour.m 

%  Subject:  Analysis,  Denoising/Compression,  Encoding,  Decoding  and  Synthesis 
%  of  speech  data 

%  Description:  These  two  routines  were  applied  on  top  of  encp6.m  and 
%  ndencomp.m.  These  two  programs  perform  the  following  tasks  in  addition  to  those 
%  already  performed  by  encp6.m  and  ndencompm: 

%  a)  Implementation  of  the  Linear  Quantizer  for  the  Coefficients 
%  vector; 

%  b)  Encoding  of  flie  Locations  Vector; 

%  c)  Encoding  of  the  positions  of  begining  of  each  segment; 

%  d)  Huffinan  coding  of  Coefficients  Vector  and  for  Locations  vector; 

%  e)  Decoding  of  all  the  vectors  on  the  Receiver’s  side 

%  f)  Reconstruction  of  the  Denoised/compressed  sequence  at  the 
%  receiver's  side; 

%  Obs.:  That  code  is  put  on  top  of  the  existent  codes  encp6.m  and  ndencompm 
%  Written  by  J.  Roberto  V.  Martins,  October  1995. 

pc,L,seglens,de,be]  =  enc(nncoef,nle,de,be);  %  Encoding  the  locations  and  coefficients 

[TX,prob,nprob,probdesc,N,nq,S]  =  quantx(X,QL);  %  Quantizing  the  coefficients 

np  =  length(probdesc); 

avwcoefF=  huffcod(np,probdesc);  %  Huffinan  coding  the  coefficients  vector 
totcoeff  =  avwcoefP'‘length(TX); 
debe  =  [  de  be  ]; 
sdebe  =  size(debe); 

ordebe  =  sort(debe); 

ndebe  =  zeros(  1  ,length(ordebe)); 

countdb=  1; 

ndebe(l,l)  =  ordebe(l); 

Indb  =  length(ndebe); 
for  coimtdb=l:length(debe)-l 
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if  ordebe(countdb+l)  >  ordebe(coiintdb) 


countdb=countdb+l ; 
ndebe(comitdb)  =  ordebe(countdb); 

end 

end 

index  =  0; 
czero  =  0; 

for  countdb  =1  :length(ndebe) 

if  ndebe(coxmtdb)  >  0 
index  =  index  +1; 
nndebe(mdex)  =  ndebe(countdb); 
else 

czero  =  czero  +  1; 
end 
end 

probdebe  =  nndebe/suni(nndebe); 
probdbde  =  fliplr(sort^robdebe)); 
nprobdbd  =  probdbde(l:length(probdbde)); 

avwdebe  =  huffcod(length(nndebe), nprobdbd);  %  coding  the  des  and  the  bes  (see  chapter  VII) 
totdebe  =  avwdebe*(Ieng1h(debe)  -  czero)  +  czero; 

[DL,probl,lenprob]  =  difl(L); 

%mDL  =  max(DL(2:length(DL))); 

totndl  =  0; 

pow=  1; 

for  indl  =1  :length(DL) 
while  DL(indl)  >  2'^ow, 
pow  =  pow+l; 
end 

totndl  =  totndl  +  pow;  %  calculating  the  necessary  number  of  bits  to  transmit 
NDLwe're  not  using  diis 
pow=l; 
end 
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totndl  =  totndl  +  round(log2(DL(l)))  ;%  we're  using  the  vector  DL  to  transmit  the 
locations 


sn  =  0; 

qc  =  zeros(l,n); 
nde  =  fliplr(de); 
nbe  =  fliplr(be); 
nseglens  =  fliplr(seglens); 
nv=  1; 

I  =  nbe./(2.^de).’'‘n  +  1; 

for  ns  =  1  :length(L) 
for  ni  =  l:length(I)-l 

if  L(ns)  >=  I(ni) 

if  L(ns)  <=  I(ni+1) 

sn  =  sn+l; 

SN(sn)  =  ni; 

NDL(sn)  =  L(ns)  -  I(ni); 

end 

end 

end 

end 

pro  =  SN/sum(SN); 
nsimbsn  =  max(SN)  -  min(SN)  +1; 
realpr  =  zeros(l, nsimbsn); 
indsn  =  1; 

realpr(indsn)  =  pro(indsn); 
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for  isn  =  l:length(pro)-l 
if  pro(isn+l)  —  pro(isn) 
realpr(indsn)  =  realpr(indsn)  +  pro(isn+l); 
else 

realpr(indsn+l)  =  pro(isn+l); 

indsn  =  indsn  +  1; 

end 

end 

desreapr  =  fliplr(sort(realpr)); 

RL(1)  =  DL(1);  %  Reconstructing  L,  die  locations  vector 

forcl=  l:length(DL)-l 
RL(cl+l)  =  RL(cl)  +  DL(cl+l); 
end 

fornv=l:length(nde)-l  %i:i+(b/(2M)*n-l)  l:length(nseglens) 

d  =  nde(nv); 
b  =  nbe(nv); 

i  =  (b/(2M)*n+l); 

nnc  =  qc(i:(nbe(l,nv+l)/(2^de(l,nv+l))*n));  %(i:2^seglens(v)+i-l); 
diislen  =  nbe(nv+l)/2^de(nv+l)*n-i+l; 

for  z  =  i:i  +  (thislen-1)  %(nbe(l,nv+l)/(2^de(l,nv+l))*n)  %1 :2'^seglens(nv) 

for  t  =  1  :length(RL) 
ifz=RL(t) 

qc(z)  =  TX(t)/(nq/2)*S; 

end 

end 

end 
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un^rev  =  0; 

unfliex  =  0; 

nnc  =  qc(i:(nbe(l,nv+l)/(2^nde(l,nv+l))*n));  %(i:2'%seglens(v)+i-l); 

%  Inverse  transforming  to  the  Time  Domain,  Unfolding  and  Reconstructing  the  Denoised/Compressed 
Decoded  Speech  Sequence 

y  =  dct_iv(nnc); 

[xc,xl,xr]  =  unfold(y,bp,bm); 

xl(packet(d,b,n))  =  xl(packet(d,b,n))  +  xc; 

if  nv  >  1  %nv  =  1  %if  b>0,  ^ 

xl(packet(d,b-l,n))  =  xl(packet(d,b-l,n))  +  xl; 
else 

xl(packet(d,b,n))  =  xl(packet(d,b,n))  +  edgeunfold('lefl',xc,bp,bm); 
end 

ifb<  2^1-1, 

xl(packet(d,b+l,n))  =  xl(packet(d,b+l,n))  +  xr; 
else 

xl(packet(d,b,n))  =  xl(packet(d,b,n))  +  edgeunfoldCright',xc,bp,bm); 
end 


end 

figure(5),plot(ny),hold 
plot(v,’:'),hold  off; 

mse  =  mean((x  -  xl).^2)  %  Computing  the  mean  square  error  between  the  denoised/compressed  in 
%  the  trasmitter  and  the  decoded  sequence  in  the  receiver; 
scoefinO  =  sum(abs(coef)>0)  %  Computing  the  number  of  original  non-zero  coeficients 
sncoefinO  =  sum(abs(nncoeQ>0)  %  Computing  the  number  of  non-zero  coefficients 

%  afterdenoising/compression 

sqcoefiaiO  =  sum(abs(qc)>0)  %  Computing  the  number  of  non-zero  coefficients  after  decoding 

figure(6), 

plot(x); 

figure(7) 

subplbt(2,3 , 1  ),plot(ny) 

title('Original  "PAY",  Female  speaker') 
subplot(2,3 ,2),plot(x) 
title('After  Denoising/Compression') 
subplot(2,3,3),plot(xl) 

title('After  Decoding') 

subplot(2,3 ,4),specgram(ny,  [],  1) 
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title('Original  Spectogram') 
subplot(2,3,5),specgram(x,[],l) 
title('After  Denoising/Compression') 
subplot(2,3,6),specgram(xl  1) 

title('After  Decoding’) 


TOTNBITS  =  totcoeff  +  totdebe  +  totndl 


TOTNSAMP  =  length(TX)  +  length(debe)  +  length(SN)  +  length(NDL) 
BITPSAMP  =  TOTNBITS/TOTNSAMP 

COMPRATIO  =  100  -  (TOTNBITS/(scoefin0*8)*100) 


%  Name:  Comp.m 
%  This  function  receives  as  input: 

%  A  vector  “c”  composed  of  coefficients  and 
%  a  percentage  number  “pcent”; 

%  As  an  output,  this  function  gives  a  vector  of  same  length  which  the  non-zero 
%  components  are  the  top  %  dominant  (100  -  pcent)  pcent  coefficients  extracted  from 
%  that  original  vector; 

%  Written  by  J.  Roberto  V.  Martins  in  October  of  1995. 

function  cc  =  comp(c, pcent) 

d  =  sort(abs(c)); 

p  =  round(pcent/100*length(c)); 

for  i  =  l:length(c) 

ifp=0 

cc(i)  =  c(i); 

elseif  abs(c(i))  <=  d(p) 

cc(i)  =  0; 

else  cc(i)  =  c(i); 

end 

end 

%d  =  (abs(c)>pcent^lOO*max(abs(c))); 

%cc  =  c.*d; 


%  Name:  Enc.m 

%  This  function  receives  a  vector,  its  length  and  the  vectors  de 
%  and  be.  As  an  output,  it  returns: 

%  X,  a  vector  with  non-zero  coefficients  extracted  from  the  input  vector; 
%  L,  the  vector  containing  the  locations  of  die  non-zero  coefficients  from 
%  the  original  input  vector; 

%  Written  by  J.  Roberto  V.  Martins  in  October  of  1995 
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function  pC,L,seglens,de,be]  =  enc(vector,Ienvec,de,be) 

n  =  0; 
ni=0; 

for  i  =  l:length(vector) 
if  abs(vector(i))  >  0 
n  =  n+l; 

X(n)  =  vector(i); 
end 
end 

forj  =l:length(vector) 
if  abs(vector(j))  >  0 
m  =  m+  1; 

L(m)=j; 

end 

end 

seglens  =  lenvec; 


%  Name:  Difl.m 

%  This  function  encodes  a  vector  by  transforming  it  into  a 

%  differentially  encoded  vector.  It  receives  the  vector  to  be  encoded  as  an  input  and  returns 
%  _The  differences  vector 

%  _The  probabilities  vector  in  descending  order  as  well  as  its  length 


fimction  [DL,prob,lendl]  =  difl(vec) 

DL(l)  =  vec(l); 

for  z  =  2:length(vec) 

DL(z)  =  vec(z)  -  vec(z-l); 

end 

a=0; 

N  =  zeros(l,length(DL)); 

count  =  1; 

SDL  =  sort(DL); 

NSDL(l)  =  SDL(1); 
forp=  l:length(SDL)-l 
ifSDL(p+l)>SDL(p) 
count  =  count +1; 

NSDL(1, count)  =  SDL(l,p+l); 
end 
end 

N  =  zeros(l,length(NSDL)); 
for  1  =  l:length(DL) 
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for  X  =  l:length(NSDL) 
ifDL(l)=NSDL(x) 

N(x)  =  N(x)+l; 
end 
end 
end 
end 

prob  =  fliplr(sort(N)/sum(N)); 

lendl  =  length(prob); 


%  Name:  Quantx.m 

%  This  function  performs  the  Linear  Quantization  proposed  in  this  thesis  for  a 
%  given  input  vector.  Inputs  are  the  vector  X  to  be  quantized  and 
%  the  number  of  quantization  levels  desired,  nq. 

%  Outputs  are: 

%  _  TX:  the  quantized  vector  to  be  transmitted; 

%  _  prob:  The  vector  of  probabilities  of  all  values  in  die  input  vector; 

%_  nprob:  The  new  vector  of  probabilities  of  all  non-zero  values  in  the  input  vector; 

%  _  probdesc:  The  new  probabilities  vector  in  descending  order  for  input  to  Huffinan  code; 
%_  N:  The  length  of  probdesc; 

%_  nq:  The  number  of  quantization  levels  (equal  to  the  input  nq); 

%  _  S:  The  scaling  factor  S,  i.e.  the  highest  present  absolute  value  in  the  vector; 

%  Written  by  J.Roberto  V.  Martins,  October  1995. 


function  [TX,prob,nprob,probdesc,N,nq,S]  =  quantx(X,nq) 
prob  =  zeros(l,nq+l); 

S=  max(abs(X)); 
normX  =  X/S; 

TX  =  round(nomiX*nq/2); 

%[N,Q]  =  hist(TX,length(TX)); 
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N  =  zeros(l,iiq+l); 

STX  =  sort(TX); 

fors=l:length(TX) 
for  p  =  -nq/2: 1  :nq/2 
ifTX(s)  =  p 

N(l,p+nq/2+l)  =  N(l,p+nq/2+l)  +  1; 
end 
end 
end 

prob  =  N/siun(N); 
t  =  0; 

for  s=l  :length(prob) 
if  prob(s)  >  0 
t  =  t+l; 

nprob(  1  ,t)=prob(  1  ,s); 

end 

end 

probdesc  =  fliplr(sort(nprob)); 


%  Name:  Huffcod.m 
%  This  function  receives  as  input : 

%  q ,  the  number  of  symbols;  and 

%  p ,  the  vector  containing  the  probabilities  of  each  symbol; 

%  As  an  output,  it  gives  the  average  word  length  of  the  sequence; 

%  The  function  uses  the  code  Huffinan.m,  by  K.L.  Track  written  on  30  November  1993 
%  Modifications  made  by  LRoberto  V.  Martins  in  October  1995. 

%  HUFFMAN  finds  the  minimum  variance  Huffinan  code  for  the  symbol 
%  probabilities  entered  by  the  user.  The  algorithm  makes  use  of 
%  permutation  matrices  for  die  combination  and  sorting  of  probabilities. 

%  Permutation  matrices  are  used  because  they  provide  a  convenient  record 
%  of  operations,  so  that  the  codewords  can  then  be  constructed  fairly  easily 
%  once  the  combination  and  sorting  of  probabilities  yields  just  two 
%  probabilities.  At  fliis  point  a  zero  is  assigned  to  one  of  the 
%  probabilities  and  a  one  assigned  to  the  other.  The  permutation  matrices 
%  are  used  to  append  additional  zeros  and  ones  as  appropriate  to  obtain 
%  the  final  codeword  for  each  symbol. 
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%%  Written  by  K.L.  Frack  for  EC4580  Course  Project 
%  Last  Update;  30  November  1993 
function  [  avwlen  ]  =  huffcod(q,p) 


%%%%%%%%%%%%%%%%%%%%0/o%0/„0/o0/„0/„0/„0/„0/„0/„0/„0/„0/„0/„0^ 

%  INPUT  THE  SYMBOLS  TO  BE  CODED  % 

%  INPUT  THE  NUMBER  OF  SYMBOLS  TO  BE  CODED.  NO  TRIVIAL  SOLUTION  ALLOWED. 
%q=0;  %  q  =  number  of  symbols.  Set  to  0  to  ensure  fliat  the  loop 

%  will  be  executed  at  least  once 

%while  q<3  %  Need  at  least  3  symbols  for  a  non-trivial  solution 

%q=input('Enter  the  number  of  symbols:  '); 
if  q<3,disp('Trivial  solution.  Use  a  larger  number  of  symbols.'),  end 
%end 

%  ENTER  THE  SYMBOL  PROBABILITIES. 

%  Note:  The  probabilities  must  sum  to  1 .00  and  must  be  in  entered  in 
%  descending  order  for  the  algorithm  to  work  properly.  Since  the  algorittun 
%  will  give  erroneous  results  if  these  errors  are  overlooked,  error  checking 
%  routines  are  included  in  later  steps. 

%dispC ') 

%disp(’Enter  the  symbol  probabilities  ( in  descending  order).') 

%fori=l:q,  p(i)=input(['  Enter  the  probability  of  s', int2str(i),';  ']);end 
%  ENSURE  THERE  ARE  ENOUGH  PROBABILITIES  ENTERED 
%  If  <RETURN>  is  inadvertently  struck  before  a  probability  is  entered  the 
%  input  command  could  yield  a  probability  vector  which  is  too  small.  This 
%  causes  the  program  to  crash.  This  procedure  prevents  this  from  happening 
%  by  setting  all  ofthe  missing  probabilities  to  zero.  In  this  event  the 
%  user  can  correct  the  wrong  probabilities  in  a  later  step, 
if  length(p)<q,  p=[p;zeros(q-lengtti(p),l)];  end 

%  ERROR  CHECK  THE  SYMBOL  PROBABILITIES 
correct='n';  %  correct  =  'n'  ensures  at  least  once  throu^  the  error  checking 
%  loop. 

count=0;  %  count  =  0  makes  the  loop  a  little  simpler.  It  prevents  the 
%  program  from  prompting  for  a  correction  until  the  loop  has 
%  been  executed  at  least  once, 
while  correct  ~=  'y'  %  Keep  looping  until  correct, 
if  counl>0;  %  This  procedure  will  be  executed  only  if  there  are  errors 
%  to  be  corrected. 

s=input('Enter  the  index  of  the  incorrect  probability:  '); 
p(s)=input(['Enter  the  correct  probability  for  s',int2str(s),':  ']); 
end 

count=l; 

%  Display  the  table. 
dispC ') 

dispCIndex  Symbol  Probability') 

dispC - ’) 

for  i=l;q 

is=[int2str(i)blanks(6)];is=is(l:7);  %  makes  a  string  from  the  index. 
ps=[num2str(p(i))  '000000'];  ps=ps(l  :6);  %  makes  a  string  from  the  prob. 
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disp(['  s’,is,'  ',ps])  %  displays  the  table 

end 

if  abs(sum(p)-l)>le-8  %  Ensures  probabilities  sum  to  one. 

correct  =  'n';  %tmha  um  "beep,”  antes 
dispC ') 

dispCError  -->  Probabilities  do  not  sum  to  1 .00 ! ') 
elseif  max(diff(p))>0  %  Ensures  probabilities  are  in  descending  order, 

correct  =  'n';%tinha  um  "beep"  antes 
dispC ') 

dispCError  — >  Probabilities  are  not  in  descending  order!') 
else  correct=inputCIs  the  table  correct?  (Enter  y  or  n):  ','s'); 

%  Asks  the  user  to  verify  that  all  the  probabilities  are  correctly 
%  entered.  A  'n'  response  will  prompt  the  user  for  corrections, 
end  ; 

end,  clear  correct  is  ps  count 
p=p';  %  p  must  be  a  column  vector 
pp=p;  %  pp  =  extra  copy  of  the  original  probability  vector 

%%%%%%%%% 

%  form  THE  Q-2  PERMUTATION  MATRICES  (LEFT  MULTIPLICATION)  % 

%%yo%%%yo%%%%%%%%%%%%%%%%%%%%0/„0/^0/g0/g0/g0/g0/^0/^0/^0/g0/^0/^0^0/g0/^0/ 0/  0/ 0/^0/ 0/  0/ O/  OAO/ 
%%%%%%%%% 

%  INITIALIZE  EACH  MATRIX  TO  THE  ZERO  MATRIX  OF  APPROPRIATE  DIMENSION 
for  i=l:q-2,  eval(['P'  int2str(i)  ’=zeros(q-i,q-i+l);']),  end 

%  SUM  THE  LOWEST  TWO  PROBABILITIES  AND  DETERMINE  NEW  SORTED  LOCATIONS 
for  k=l  :q-2  %  do  for  each  of  the  q-2  permutation  matrices 

Sum=p(q+l-k)+p(q-k);  %  sum  the  two  lowest  (and  smallest)  probabilities 
i=l; 

while  Sum  <  p(i)  %  find  highest  location  in  p  the  vector  for  the  sum 
eval(['P'  int2str(k)  '(i,i)  =  1;']) 
i=i+l; 
end 

eval(['P'  int2str(k)  ’(i,q-k:q-k+l)  =  [1  1];'])  %  This  is  the  spot 
while  i<q-k  %  form  rest  of  matrix  with  the  remaining  probabilities 
i=i+l; 

eval(['F  int2str(k)  •(i,i-l)  =  1;']) 
end 

p=eval(['P’  int2str(k)])*p;  %  multiply  permutation  matrix  and  probability 
%  vector  to  get  new  probability  vector. 

end,  clear  p  Sum  k 

%%%%%%%%%%%%%%%%%%%%0/o0/o 

%  FORM  THE  SYMBOLS  % 

%%%%%%%%%%%%%%%%%%o/o%%<>/o 

%  The  symbols  are  formed  using  matrices  of  characters.  The  characters  are 
%  ones,  zeros,  and  blanks.  Each  row  in  a  matrix  represents  a  codeword.  TTie 
%  final  codewords  are  in  the  sO  matrix.  Blanks  are  included  in  the  matrices 
%  in  order  to  make  this  part  of  the  algorithm  work  efficiently.  These  blanks 
%  are  removed  in  a  later  step. 

%  INITIALIZE  ALL  CODEWORD  MATRICES  TO  BLANKS  (Blank  =  32  in  ASCH) 
for  i=l:q-l,  eval(['s’  int2str(i-l)  '=  32*ones(q-i+l,q-i);']),  end 

%  SET  RIGHTMOST  CODEWORD  VECTOR  TO  ['O’  '1']’  (0=48  in  ASCII,  1=49  in  ASCII) 
eval(['s'  inl2str(q-2) '  =  [48;  49];']) 

%  WORK  FROM  RIGHT  TO  LEFT  USING  THE  P  MATRICES  TO  FORM  THE  CODEWORDS 
%  The  codewords  are  formed  fi-om  matrices  of  zeros  (ASCII  48),  ones  (ASCII  49), 
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%  or  blanks  (ASCII  32).  Sq-1  is  the  rightmost  matrix  and  has  the  [0  1]' 

%  matrix.  sO  is  the  leftmost  matrix  and  contains  the  final  codewords 
%  (except  for  extra  blanks), 
for  i=q-2:-l:l 

twosum=fmd((svnn((eval(['P'  int2str(i)])'))')==2); 

%  twosum  is  the  index  of  the  row  of  the  permutation  matrix  with  two  ones. 

%  This  is  the  row  which  accomplishes  the  addition  of  the  two  lowest 
%  probabilities.  Its  index  indicates  where  the  sum  is  to  be  placed  in  the 
%  new  probability  vector.  This  index  also  gives  information  on  how  to 
%  form  the  codewords. 

onesum=fmd((sum((eval([T'  int2str(i)])'))')=l); 

%  onesum  has  the  indices  of  all  the  rows  of  the  permutation  matrix  with 
%  only  single  ones.  The  indices  indicate  how  the  probabilities  will  be 
%  placed  in  the  new  probability  vector.  These  indices  also  give 
%  information  on  how  to  form  the  codewords. 
eval(['s'  int2str(i-l)  '(l:q-i-l,l:q-i-l)=s'  int2str(i)  '(onesum,  l:q-i-l);']) 
eval([’s' int2str(i-l) '(q-i  ,l:q-i-l)=s' int2str(i)’(twosum,l:q-i-l);']) 
eval(['s'  int2str(i-l)  '(q-i+l,l:q-i-l)=s'  int2str(i)  '(twosum, l:q-i-l);']) 
eval(['s'  int2str(i-l)  '(q-i  ,q-i)=48;']) 
eval(['s'  int2str(i-l)  '(q-i+l,q-i)=49;']) 

%  The  five  lines  above  place  the  appropriate  ones,  zeros,  and  blanks  in  the 
%  codeword  matrices  as  the  progression  moves  from  the  right  to  the  left. 
eval(['clear  P'  int2str(i) '  s'  int2str(i)]) 
end,  clear  onesum  twosum 

%  FIND  AND  REMOVE  THE  BLANKS  FROM  EACH  CODEWORD  AND  COMPUTE  WORD 

LENGTHS 

forl=l:q 

eval(PS'  int2str(i) '  =  (s0(i,:));'])  %  sO  has  all  the  needed  information 
eval(rc=find(S’  int2str(i)  '=  32);'])  %  find  all  the  blanks 
eval(['S'  int2str(i)  '(c)  =  [];’])  %  remove  all  the  blanks 

eval([’S'  int2str(i)  ’  =  setstr(S'  int2str(i) ');'])  %  convert  from  ASCII 
eval([TL(i)=len^(S'  int2str(i) ');’])  %  compute  the  length  of  each  codeword 
end,  clear  sO  c 
avwlen  =  sum(L*pp); 

%%%%%%%%%%%%%%%%%%%%%%%% 

%  DISPLAY  THE  OUTPUT  % 

%%%%%%%%%%%%%%%%%%%%%%%% 
dispC ') 

dispCSymbol  Probability  Code  Word') 

disp(' - ') 

fori=l:q 

is=[int2str(i)  blanks(6)];  is=is(l:7); 
ps=[num2str(pp(i))  '000000'];  ps=ps(l:6); 
disp([' s',is,'  ',ps,'  ',eval(['S’int2str(i)])]) 

end,  clear  is  ps  i  q,  dispC  *) 

%  COMPUTE  AND  DISPLAY  AVERAGE  WORD  LENGTH 
L_avg=sum(L’*pp); 

disp(['The  average  word  length  is ',  num2str(L_avg)]) 

%  COMPUTE  AND  DISPLAY  THE  ENTROPY 

H=sum(pp.*log2(I./pp)); 

disp(['The  entropy  is  ',  num2str(H)]) 

%  COMPUTE  AND  DISPLAY  VARIANCE 

var=sum(((L_avg-L).^2)*pp); 

disp(['The  variance  is  ',  num2str(var)]) 
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